[jira] [Commented] (MESOS-5468) Add logic in long-lived-framework to handle network partitions.
[ https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309244#comment-15309244 ] Jay Guo commented on MESOS-5468: [~anandmazumdar] Sorry for the delay. One out of two connections between framework and master is successfully closed, however another one is left ESTABLISHED when master attempts to remove the framework. Upon network rejoin, master repeatedly denied subscription call from framework. So the question is, is the EVENT connection left open intentionally or accidentally? Here's the full log: {code:title=master.log} I0601 12:12:03.671700 2252 master.cpp:5195] Status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- from agent edbc3730-e55b-4390-a1f2-5de5a66497f5-S0 at slave(1)@127.0.1.1:5051 (ubuntu) I0601 12:12:03.671931 2252 master.cpp:5243] Forwarding status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- I0601 12:12:03.672360 2252 master.cpp:6853] Updating the state of task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (latest state: TASK_FINISHED, status update state: TASK_FINISHED) I0601 12:14:43.677433 2247 master.cpp:5195] Status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- from agent edbc3730-e55b-4390-a1f2-5de5a66497f5-S0 at slave(1)@127.0.1.1:5051 (ubuntu) I0601 12:14:43.677781 2247 master.cpp:5243] Forwarding status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- I0601 12:14:43.678387 2247 master.cpp:6853] Updating the state of task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (latest state: TASK_FINISHED, status update state: TASK_FINISHED) I0601 12:20:03.679064 2251 master.cpp:5195] Status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- from agent edbc3730-e55b-4390-a1f2-5de5a66497f5-S0 at slave(1)@127.0.1.1:5051 (ubuntu) I0601 12:20:03.679194 2251 master.cpp:5243] Forwarding status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- I0601 12:20:03.679565 2251 master.cpp:6853] Updating the state of task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (latest state: TASK_FINISHED, status update state: TASK_FINISHED) E0601 12:25:02.891707 2254 process.cpp:2040] Failed to shutdown socket with fd 13: Transport endpoint is not connected I0601 12:25:02.895753 2248 master.cpp:1388] Framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (Long Lived Framework (C++)) disconnected I0601 12:25:02.896077 2248 master.cpp:2822] Disconnecting framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (Long Lived Framework (C++)) I0601 12:25:02.896289 2248 master.cpp:2846] Deactivating framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (Long Lived Framework (C++)) W0601 12:25:02.896682 2248 master.hpp:1903] Master attempted to send message to disconnected framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (Long Lived Framework (C++)) W0601 12:25:02.897027 2248 master.hpp:1909] Unable to send event to framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (Long Lived Framework (C++)): connection closed I0601 12:25:02.897341 2248 master.cpp:1401] Giving framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (Long Lived Framework (C++)) 0ns to failover I0601 12:25:02.896751 2249 hierarchical.cpp:375] Deactivated framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- I0601 12:25:02.901005 2251 master.cpp:5608] Framework failover timeout, removing framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (Long Lived Framework (C++)) I0601 12:25:02.901053 2251 master.cpp:6338] Removing framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (Long Lived Framework (C++)) I0601 12:25:02.901409 2251 master.cpp:6853] Updating the state of task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (latest state: TASK_FINISHED, status update state: TASK_KILLED) I0601 12:25:02.901449 2251 master.cpp:6919] Removing task 3 with resources cpus(*):0.001; mem(*):1 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- on agent edbc3730-e55b-4390-a1f2-5de5a66497f5-S0 at slave(1)@127.0.1.1:5051 (ubuntu) I0601 12:25:02.901721 2251 master.cpp:6948] Removing executor 'default' with resources cpus(*):0.1; mem(*):32 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- on agent edbc3730-e55b-4390-a1f2-5de5a66497f5-S0 at slave(1)@127.0.1.1:5051 (ubuntu) I0601 12:25:02.902426 2251 hierarchical.cpp:326] Removed framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- W0601 12:25:08.007905 2253 master.cpp:5291] Ignoring unknown exited executor 'default'
[jira] [Commented] (MESOS-5359) The scheduler library should have a delay before initiating a connection with master.
[ https://issues.apache.org/jira/browse/MESOS-5359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309115#comment-15309115 ] José Guilherme Vanz commented on MESOS-5359: I already have a preliminary patch. But before submitted it to review, I would like to ask something related which is the best approach considering the Mesos way to does stuff. Right now, I added a {{flags}} member in the {{MesosProcess}}: {code:title=src/scheduler/scherduler.ccp|borderStyle=solid} MesosProcess( const string& master, ContentType _contentType, const lambda::function& connected, const lambda::function & disconnected, const lambda::function & received, const Option& _credential, const Option & _detector, const mesos::v1::scheduler::Flags& _flags) : ProcessBase(ID::generate("scheduler")), state(DISCONNECTED), contentType(_contentType), callbacks {connected, disconnected, received}, credential(_credential), local(false), flags(_flags) { Mesos::Mesos( const string& master, ContentType contentType, const lambda::function & connected, const lambda::function & disconnected, const lambda::function & received, const Option& credential, const Option & detector, const mesos::v1::scheduler::Flags& flags) { {code} The {{mesos::v1::scheduler::Flags}} is the class created following the {{src/sched/flags.hpp}} example. However, I'm not sure if pass the {{Flags}} object is the best idea. I believe that the old api does that because the scheduler driver, as an "internal" class, is responsable for that. The new API {{Mesos}} class is the instanciated by the scheduler itself thereby I had to add the {{mesos::v1::scheduler::Flags}} in the include dir, allowing scheduler to instanciante the class. Is it ok? Should I pass just the flag value in the {{Mesos}} constructor such as master connection url? > The scheduler library should have a delay before initiating a connection with > master. > - > > Key: MESOS-5359 > URL: https://issues.apache.org/jira/browse/MESOS-5359 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Anand Mazumdar >Assignee: José Guilherme Vanz > Labels: mesosphere > > Currently, the scheduler library {{src/scheduler/scheduler.cpp}} does have an > artificially induced delay when trying to initially establish a connection > with the master. In the event of a master failover or ZK disconnect, a large > number of frameworks can get disconnected and then thereby overwhelm the > master with TCP SYN requests. > On a large cluster with many agents, the master is already overwhelmed with > handling connection requests from the agents. This compounds the issue > further on the master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5395) Task getting stuck in staging state if launch it on a rebooted slave.
[ https://issues.apache.org/jira/browse/MESOS-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308926#comment-15308926 ] Gilbert Song commented on MESOS-5395: - [~Mengkui], Thanks for reporting this issue. Could you reproduce this issue and see whether restarting the slave process resolve the issue? BTW, could you verify https://issues.apache.org/jira/browse/MESOS-5482 is identical to this issue? Thanks. :) > Task getting stuck in staging state if launch it on a rebooted slave. > - > > Key: MESOS-5395 > URL: https://issues.apache.org/jira/browse/MESOS-5395 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.28.0 > Environment: mesos/marathon cluster, 3 maters/4 slaves > Mesos: 0.28.0 , Marathon 0.15.2 >Reporter: Mengkui gong > Attachments: mesos-log.zip > > > if rebooting a slave, after that, using Marathon to launch a task, the task > can start on other slaves without problem. But if launch it on the rebooted > slave, the task will be stuck. From Mesos UI shows it in staging state from > active tasks list. From Marathon UI shows it in deploying state. It can > keeping in stuck state for more than 2 hours. After that time, Marathon will > automatically launch the task on this rebooted slave or other slave as > normal. So the rebooted slave be recovered as well after that time. > From Mesos log, I can see "telling slave to kill task" all the time. > I0517 15:25:27.207237 20568 master.cpp:3826] Telling slave > 282745ab-423a-4350-a449-3e8cdfccfb93-S1 at slave(1)@10.254.234.236:5050 > (mesos-slave-3) to kill task > project-hub_project-hub-frontend.b645f24b-1c1f-11e6-bb25-d00d2cce797e of > framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730- (marathon) at > scheduler-fe615b72-ab92-49ca-89e6-e74e600c7e15@10.254.228.3:56757. > From rebooted slave log, I can see: > May 17 15:28:37 euca-10-254-234-236 mesos-slave[829]: I0517 15:28:37.206831 > 916 slave.cpp:1891] Asked to kill task > project-hub_project-hub-frontend.b645f24b-1c1f-11e6-bb25-d00d2cce797e of > framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730- > May 17 15:28:37 euca-10-254-234-236 mesos-slave[829]: W0517 15:28:37.206866 > 916 slave.cpp:2018] Ignoring kill task > project-hub_project-hub-frontend.b645f24b-1c1f-11e6-bb25-d00d2cce797e because > the executor > 'project-hub_project-hub-frontend.b645f24b-1c1f-11e6-bb25-d00d2cce797e' of > framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730- is terminating/terminated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5520) Before starting a build, bootstrapping shows some warning
[ https://issues.apache.org/jira/browse/MESOS-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308870#comment-15308870 ] Gilbert Song commented on MESOS-5520: - [~anksv], seems like this relates to libprocess/stout Makefile, which may not block the build. Should be an easy fix on Makefile. > Before starting a build, bootstrapping shows some warning > - > > Key: MESOS-5520 > URL: https://issues.apache.org/jira/browse/MESOS-5520 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 0.28.1 > Environment: ubuntu 14.04.4 >Reporter: Ankur Verma >Priority: Minor > Attachments: bootstrap_cmd_logs.txt > > > For the first time before building, some warning comes when using command > bootstrap > # Bootstrap (Only required if building from git repository). > $ ./bootstrap > Logs: > 3rdparty/stout/Makefile.am:71: warning: source file > 'tests/subcommand_tests.cpp' is in a subdirectory, > 3rdparty/stout/Makefile.am:71: but option 'subdir-objects' is disabled > 3rdparty/stout/Makefile.am:71: warning: source file 'tests/svn_tests.cpp' is > in a subdirectory, > 3rdparty/stout/Makefile.am:71: but option 'subdir-objects' is disabled > 3rdparty/stout/Makefile.am:71: warning: source file 'tests/try_tests.cpp' is > in a subdirectory, > 3rdparty/stout/Makefile.am:71: but option 'subdir-objects' is disabled > 3rdparty/stout/Makefile.am:71: warning: source file 'tests/uuid_tests.cpp' is > in a subdirectory, > 3rdparty/stout/Makefile.am:71: but option 'subdir-objects' is disabled > 3rdparty/stout/Makefile.am:71: warning: source file 'tests/version_tests.cpp' > is in a subdirectory, > 3rdparty/stout/Makefile.am:71: but option 'subdir-objects' is disabled > 3rdparty/stout/Makefile.am:122: warning: source file 'tests/proc_tests.cpp' > is in a subdirectory, > 3rdparty/stout/Makefile.am:122: but option 'subdir-objects' is disabled -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5405) Make fields in authorization::Request protobuf optional.
[ https://issues.apache.org/jira/browse/MESOS-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308861#comment-15308861 ] Joerg Schad commented on MESOS-5405: https://reviews.apache.org/r/48101/ > Make fields in authorization::Request protobuf optional. > > > Key: MESOS-5405 > URL: https://issues.apache.org/jira/browse/MESOS-5405 > Project: Mesos > Issue Type: Bug >Reporter: Alexander Rukletsov >Assignee: Till Toenshoff >Priority: Blocker > Labels: mesosphere, security > Fix For: 1.0.0 > > > Currently {{authorization::Request}} protobuf declares {{subject}} and > {{object}} as required fields. However, in the codebase we not always set > them, which renders the message in the uninitialized state, for example: > * > https://github.com/apache/mesos/blob/0bfd6999ebb55ddd45e2c8566db17ab49bc1ffec/src/common/http.cpp#L603 > * > https://github.com/apache/mesos/blob/0bfd6999ebb55ddd45e2c8566db17ab49bc1ffec/src/master/http.cpp#L2057 > I believe that the reason why we don't see issues related to this is because > we never send authz requests over the wire, i.e., never serialize/deserialize > them. However, they are still invalid protobuf messages. Moreover, some > external authorizers may serialize these messages. > We can either ensure all required fields are set or make both {{subject}} and > {{object}} fields optional. This will also require updating local authorizer, > which should properly handle the situation when these fields are absent. We > may also want to notify authors of external authorizers to update their code > accordingly. > It looks like no deprecation is necessary, mainly because we > already—erroneously!—treat these fields as optional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5482) mesos/marathon task stuck in staging after slave reboot
[ https://issues.apache.org/jira/browse/MESOS-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308843#comment-15308843 ] Gilbert Song commented on MESOS-5482: - [~lutfu], thanks for reporting this issue. Are you always able to reproduce this issue? or it is just occasionally (like a race)? > mesos/marathon task stuck in staging after slave reboot > --- > > Key: MESOS-5482 > URL: https://issues.apache.org/jira/browse/MESOS-5482 > Project: Mesos > Issue Type: Bug >Reporter: lutful karim > Attachments: marathon-mesos-masters_after-reboot.log, > mesos-masters_mesos.log, mesos_slaves_after_reboot.log, > tasks_running_before_rebooot.marathon > > > The main idea of mesos/marathon is to sleep well, but after node reboot mesos > task gets stuck in staging for about 4 hours. > To reproduce the issue: > - setup a mesos cluster in HA mode with systemd enabled mesos-master and > mesos-slave service. > - run docker registry (https://hub.docker.com/_/registry/ ) with mesos > constraint (hostname:LIKE:mesos-slave-1) in one node. Reboot the node and > notice that task getting stuck in staging. > Possible workaround: service mesos-slave restart fixes the issue. > OS: centos 7.2 > mesos version: 0.28.1 > marathon: 1.1.1 > zookeeper: 3.4.8 > docker: 1.9.1 dockerAPIversion: 1.21 > error message: > May 30 08:38:24 euca-10-254-237-140 mesos-slave[832]: W0530 08:38:24.120013 > 909 slave.cpp:2018] Ignoring kill task > docker-registry.066fb448-2628-11e6-bedd-d00d0ef81dc3 because the executor > 'docker-registry.066fb448-2628-11e6-bedd-d00d0ef81dc3' of framework > 8517fcb7-f2d0-47ad-ae02-837570bef929- is terminating/terminated -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5339) Create Tests for testing fine-grained HTTP endpoint filtering.
[ https://issues.apache.org/jira/browse/MESOS-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-5339: -- Fix Version/s: 1.0.0 > Create Tests for testing fine-grained HTTP endpoint filtering. > -- > > Key: MESOS-5339 > URL: https://issues.apache.org/jira/browse/MESOS-5339 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad >Assignee: Joerg Schad > Fix For: 1.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5439) registerExecutor problem
[ https://issues.apache.org/jira/browse/MESOS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308814#comment-15308814 ] Gilbert Song commented on MESOS-5439: - hi [~wnghksrla001], are you saying it is only slow between 'Forked child with pid' and 'Got registration for executor', or you are saying all the agent logging is slow. If it is the former case, it may be related to the executor. As an usual case, it should be pretty quick. You can test it out to launch some similar tasks using mesos-execute with command executor. > registerExecutor problem > > > Key: MESOS-5439 > URL: https://issues.apache.org/jira/browse/MESOS-5439 > Project: Mesos > Issue Type: Bug > Components: c++ api, slave >Affects Versions: 0.27.0 >Reporter: kimjoohwan > > Currently, we are using Mesos 0.27.0. The master is build up with a Intel(R) > Core(TM) i5-3470 CPU @ 3.20GHz CPU and a 4GB RAM. The slave (Banana PI) is > build up with a Cortex -A7 Dual-Core CPU and a 1GB RAM. > By using the Mesos API, we have developed and completed the execution of the > framework which is based on python. > but, we found that it takes too much time between the messages, 'Forked child > with pid' and 'Got registration for executor' from the slave log. (5sec) > If you know how to deal with this problem, please let us know. > I0523 17:38:16.264289 1787 slave.cpp:5208] Launching executor default of > framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 with resources in work > directory > '/tmp/mesos/slaves/3fb86eea-96c4-4b07-aaa2-caf071275bdf-S2/frameworks/3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010/executors/default/runs/1c830c9a-4120-4ef0-af80-49a52d307539' > I0523 17:38:16.290601 1789 containerizer.cpp:616] Starting container > '1c830c9a-4120-4ef0-af80-49a52d307539' for executor 'default' of framework > '3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010' > I0523 17:38:16.293285 1787 slave.cpp:1626] Queuing task '0' for executor > 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 > I0523 17:38:16.297369 1787 slave.cpp:4233] Current disk usage 2.14%. Max > allowed age: 6.150293798159722days > I0523 17:38:16.504043 1789 launcher.cpp:132] Forked child with pid '1837' > for container '1c830c9a-4120-4ef0-af80-49a52d307539' > I0523 17:38:21.510535 1785 slave.cpp:2573] Got registration for executor > 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from > executor(1)@192.168.0.8:56508 > I0523 17:38:21.554608 1785 slave.cpp:1791] Sending queued task '0' to > executor 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 at > executor(1)@192.168.0.8:56508 > I0523 17:38:21.594511 1789 slave.cpp:2932] Handling status update > TASK_RUNNING (UUID: cd04ec2a-0e68-460a-ad2e-e4f504f3b032) for task 0 of > framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from > executor(1)@192.168.0.8:56508 > I0523 17:38:21.600050 1789 slave.cpp:2932] Handling status update > TASK_FINISHED (UUID: 46e110c8-4078-4f98-ae30-30b3a1376034) for task 0 of > framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from > executor(1)@192.168.0.8:56508 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5405) Make fields in authorization::Request protobuf optional.
[ https://issues.apache.org/jira/browse/MESOS-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308765#comment-15308765 ] Till Toenshoff commented on MESOS-5405: --- sgtm > Make fields in authorization::Request protobuf optional. > > > Key: MESOS-5405 > URL: https://issues.apache.org/jira/browse/MESOS-5405 > Project: Mesos > Issue Type: Bug >Reporter: Alexander Rukletsov >Assignee: Till Toenshoff >Priority: Blocker > Labels: mesosphere, security > Fix For: 1.0.0 > > > Currently {{authorization::Request}} protobuf declares {{subject}} and > {{object}} as required fields. However, in the codebase we not always set > them, which renders the message in the uninitialized state, for example: > * > https://github.com/apache/mesos/blob/0bfd6999ebb55ddd45e2c8566db17ab49bc1ffec/src/common/http.cpp#L603 > * > https://github.com/apache/mesos/blob/0bfd6999ebb55ddd45e2c8566db17ab49bc1ffec/src/master/http.cpp#L2057 > I believe that the reason why we don't see issues related to this is because > we never send authz requests over the wire, i.e., never serialize/deserialize > them. However, they are still invalid protobuf messages. Moreover, some > external authorizers may serialize these messages. > We can either ensure all required fields are set or make both {{subject}} and > {{object}} fields optional. This will also require updating local authorizer, > which should properly handle the situation when these fields are absent. We > may also want to notify authors of external authorizers to update their code > accordingly. > It looks like no deprecation is necessary, mainly because we > already—erroneously!—treat these fields as optional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5405) Make fields in authorization::Request protobuf optional.
[ https://issues.apache.org/jira/browse/MESOS-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308764#comment-15308764 ] Till Toenshoff commented on MESOS-5405: --- Additional work was done belonging to this issue: Added {{Request}} sanity checks in {{LocalAuthorizer}}: https://reviews.apache.org/r/48085/ Updated comments in authorizer.proto.: https://reviews.apache.org/r/48093/ Note that the latter tries to supercede https://reviews.apache.org/r/47876 - by borrowing some inspirations from it - thanks [~adam-mesos]! > Make fields in authorization::Request protobuf optional. > > > Key: MESOS-5405 > URL: https://issues.apache.org/jira/browse/MESOS-5405 > Project: Mesos > Issue Type: Bug >Reporter: Alexander Rukletsov >Assignee: Till Toenshoff >Priority: Blocker > Labels: mesosphere, security > Fix For: 1.0.0 > > > Currently {{authorization::Request}} protobuf declares {{subject}} and > {{object}} as required fields. However, in the codebase we not always set > them, which renders the message in the uninitialized state, for example: > * > https://github.com/apache/mesos/blob/0bfd6999ebb55ddd45e2c8566db17ab49bc1ffec/src/common/http.cpp#L603 > * > https://github.com/apache/mesos/blob/0bfd6999ebb55ddd45e2c8566db17ab49bc1ffec/src/master/http.cpp#L2057 > I believe that the reason why we don't see issues related to this is because > we never send authz requests over the wire, i.e., never serialize/deserialize > them. However, they are still invalid protobuf messages. Moreover, some > external authorizers may serialize these messages. > We can either ensure all required fields are set or make both {{subject}} and > {{object}} fields optional. This will also require updating local authorizer, > which should properly handle the situation when these fields are absent. We > may also want to notify authors of external authorizers to update their code > accordingly. > It looks like no deprecation is necessary, mainly because we > already—erroneously!—treat these fields as optional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-5405) Make fields in authorization::Request protobuf optional.
[ https://issues.apache.org/jira/browse/MESOS-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308700#comment-15308700 ] Till Toenshoff edited comment on MESOS-5405 at 5/31/16 9:46 PM: [~tillt] [~adam-mesos] [~mcypark] This breaks some assumptions of the current `authorized` interface which assume `subject` and `object` are set (see below). In order to accomodate for this these new optional fields i would propose the following 1. Change getObjectApprover's signatures to accept Option, Option 2. Change objectApprover->approved() signature to accept an Option (and adapt the logic in approved for the LocalAuthorizerObjectApprover to deal with the None -> Any conversion) {noformat} Future authorized(const authorization::Request& request) { return getObjectApprover(request.subject(), request.action()) .then([=](const Owned& objectApprover) -> Future { ObjectApprover::Object object(request.object()); Try result = objectApprover->approved(object); if (result.isError()) { return Failure(result.error()); } return result.get(); }); } {noformat} was (Author: js84): [~tillt] [~adam-mesos] [~mcypark] This breaks some assumptions of the current `authorized` interface which assume `subject` and `object` are set (see below). In order to accomodate for this these new optional fields i would propose the following 1. Change getObjectApprover's signatures to accept Option, Option 2. Change objectApprover->approved() signature to accept an Option (and adapt the logic in approved for the LocalAuthorizerObjectApprover to deal with the None -> Any conversion) ``` Future authorized(const authorization::Request& request) { return getObjectApprover(request.subject(), request.action()) .then([=](const Owned& objectApprover) -> Future { ObjectApprover::Object object(request.object()); Try result = objectApprover->approved(object); if (result.isError()) { return Failure(result.error()); } return result.get(); }); } ``` > Make fields in authorization::Request protobuf optional. > > > Key: MESOS-5405 > URL: https://issues.apache.org/jira/browse/MESOS-5405 > Project: Mesos > Issue Type: Bug >Reporter: Alexander Rukletsov >Assignee: Till Toenshoff >Priority: Blocker > Labels: mesosphere, security > Fix For: 1.0.0 > > > Currently {{authorization::Request}} protobuf declares {{subject}} and > {{object}} as required fields. However, in the codebase we not always set > them, which renders the message in the uninitialized state, for example: > * > https://github.com/apache/mesos/blob/0bfd6999ebb55ddd45e2c8566db17ab49bc1ffec/src/common/http.cpp#L603 > * > https://github.com/apache/mesos/blob/0bfd6999ebb55ddd45e2c8566db17ab49bc1ffec/src/master/http.cpp#L2057 > I believe that the reason why we don't see issues related to this is because > we never send authz requests over the wire, i.e., never serialize/deserialize > them. However, they are still invalid protobuf messages. Moreover, some > external authorizers may serialize these messages. > We can either ensure all required fields are set or make both {{subject}} and > {{object}} fields optional. This will also require updating local authorizer, > which should properly handle the situation when these fields are absent. We > may also want to notify authors of external authorizers to update their code > accordingly. > It looks like no deprecation is necessary, mainly because we > already—erroneously!—treat these fields as optional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5405) Make fields in authorization::Request protobuf optional.
[ https://issues.apache.org/jira/browse/MESOS-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308700#comment-15308700 ] Joerg Schad commented on MESOS-5405: [~tillt] [~adam-mesos] [~mcypark] This breaks some assumptions of the current `authorized` interface which assume `subject` and `object` are set (see below). In order to accomodate for this these new optional fields i would propose the following 1. Change getObjectApprover's signatures to accept Option, Option 2. Change objectApprover->approved() signature to accept an Option (and adapt the logic in approved for the LocalAuthorizerObjectApprover to deal with the None -> Any conversion) ``` Future authorized(const authorization::Request& request) { return getObjectApprover(request.subject(), request.action()) .then([=](const Owned& objectApprover) -> Future { ObjectApprover::Object object(request.object()); Try result = objectApprover->approved(object); if (result.isError()) { return Failure(result.error()); } return result.get(); }); } ``` > Make fields in authorization::Request protobuf optional. > > > Key: MESOS-5405 > URL: https://issues.apache.org/jira/browse/MESOS-5405 > Project: Mesos > Issue Type: Bug >Reporter: Alexander Rukletsov >Assignee: Till Toenshoff >Priority: Blocker > Labels: mesosphere, security > Fix For: 1.0.0 > > > Currently {{authorization::Request}} protobuf declares {{subject}} and > {{object}} as required fields. However, in the codebase we not always set > them, which renders the message in the uninitialized state, for example: > * > https://github.com/apache/mesos/blob/0bfd6999ebb55ddd45e2c8566db17ab49bc1ffec/src/common/http.cpp#L603 > * > https://github.com/apache/mesos/blob/0bfd6999ebb55ddd45e2c8566db17ab49bc1ffec/src/master/http.cpp#L2057 > I believe that the reason why we don't see issues related to this is because > we never send authz requests over the wire, i.e., never serialize/deserialize > them. However, they are still invalid protobuf messages. Moreover, some > external authorizers may serialize these messages. > We can either ensure all required fields are set or make both {{subject}} and > {{object}} fields optional. This will also require updating local authorizer, > which should properly handle the situation when these fields are absent. We > may also want to notify authors of external authorizers to update their code > accordingly. > It looks like no deprecation is necessary, mainly because we > already—erroneously!—treat these fields as optional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5494) Implement GET_ROLES Call in v1 master API.
[ https://issues.apache.org/jira/browse/MESOS-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308676#comment-15308676 ] Abhishek Dasgupta commented on MESOS-5494: -- RR: https://reviews.apache.org/r/48094 > Implement GET_ROLES Call in v1 master API. > -- > > Key: MESOS-5494 > URL: https://issues.apache.org/jira/browse/MESOS-5494 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Abhishek Dasgupta > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5405) Make fields in authorization::Request protobuf optional.
[ https://issues.apache.org/jira/browse/MESOS-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-5405: -- Shepherd: Adam B > Make fields in authorization::Request protobuf optional. > > > Key: MESOS-5405 > URL: https://issues.apache.org/jira/browse/MESOS-5405 > Project: Mesos > Issue Type: Bug >Reporter: Alexander Rukletsov >Assignee: Till Toenshoff >Priority: Blocker > Labels: mesosphere, security > Fix For: 1.0.0 > > > Currently {{authorization::Request}} protobuf declares {{subject}} and > {{object}} as required fields. However, in the codebase we not always set > them, which renders the message in the uninitialized state, for example: > * > https://github.com/apache/mesos/blob/0bfd6999ebb55ddd45e2c8566db17ab49bc1ffec/src/common/http.cpp#L603 > * > https://github.com/apache/mesos/blob/0bfd6999ebb55ddd45e2c8566db17ab49bc1ffec/src/master/http.cpp#L2057 > I believe that the reason why we don't see issues related to this is because > we never send authz requests over the wire, i.e., never serialize/deserialize > them. However, they are still invalid protobuf messages. Moreover, some > external authorizers may serialize these messages. > We can either ensure all required fields are set or make both {{subject}} and > {{object}} fields optional. This will also require updating local authorizer, > which should properly handle the situation when these fields are absent. We > may also want to notify authors of external authorizers to update their code > accordingly. > It looks like no deprecation is necessary, mainly because we > already—erroneously!—treat these fields as optional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5529) Distinguish non-revocable and revocable allocation guarantees.
Benjamin Mahler created MESOS-5529: -- Summary: Distinguish non-revocable and revocable allocation guarantees. Key: MESOS-5529 URL: https://issues.apache.org/jira/browse/MESOS-5529 Project: Mesos Issue Type: Epic Components: allocation Reporter: Benjamin Mahler Currently, the notion of fair sharing and quota do not make a distinction between revocable and non-revocable resources. However, this makes fair sharing difficult since we currently offer resources as non-revocable within the fair share and cannot perform revocation when we need to restore fairness or quota. As we move towards providing guarantees for the particular resources types, we may want to allow the operator to specify quota (absolutes) or shares (relatives) for both revocable or non-revocable resources: | |*Non-revocable*|*Revocable*| |*Quota*|absolute guarantees for non-revocable resources (well suited for service-like always running workloads)|absolute guarantees for revocable resources (useful for expressing minimum requirements of batch workload?)| |*Fair Share*|relative guarantees for non-revocable resources (e.g. backwards compatibility with old behavior)|relative guarantees for revocable resources (e.g. well suited for fair sharing in a dynamic cluster)| See MESOS-5526 for revocation support. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5528) Use inverse offers to reclaim resources from schedulers over their quota.
Benjamin Mahler created MESOS-5528: -- Summary: Use inverse offers to reclaim resources from schedulers over their quota. Key: MESOS-5528 URL: https://issues.apache.org/jira/browse/MESOS-5528 Project: Mesos Issue Type: Epic Components: allocation Reporter: Benjamin Mahler As we move towards distinguishing non-revocable and revocable allocation of resources, we need to enforce that the upper limits specified via quota are enforced. For example, if a scheduler has quota for non-revocable resources and there is only fair sharing turned on for revocable resources, the scheduler should not be able to consume more non-revocable resources than its quota limit. Even if mesos disallows this when tasks are launched, there are cases where the scheduler can exceed its quota: * Unreachable nodes that were not accounted for reconnect to the cluster with existing resources allocated to the scheduler's role. * The operator lowers the amount of quota for the role. In these cases and more generally, we need an always running mechanism for reclaiming excess quota allocation via inverse offers. The deadline should be configurable by the operator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5527) Provide work conservation incentives for schedulers.
Benjamin Mahler created MESOS-5527: -- Summary: Provide work conservation incentives for schedulers. Key: MESOS-5527 URL: https://issues.apache.org/jira/browse/MESOS-5527 Project: Mesos Issue Type: Epic Components: allocation, framework Reporter: Benjamin Mahler As we begin to add support for schedulers to revoke resources to obtain their quota or fair share, we need to consider the case of non-cooperative or malicious schedulers that cause excessive revocation either by accident or intentionally. For example, a malicious scheduler could keep a low allocation below its fair share, and revoke as many resources as it can in order to disturb existing work as much as possible. We can provide mitigation techniques, or incentives / penalties to schedulers that cause excessive revocation: * Disallow revocation when a scheduler resources are available. The scheduler must choose available resources or wait until allocated resources free up. This means picky schedulers may not obtain the resources they want. * Penalize schedulers causing excessive revocation in order to incentivize them to play nicely. * Use a degree of pessimism to restrict which resources a scheduler can revoke (e.g. only batch tasks that have not been running for a long time). If we augment task information to know whether it is a service or a batch job we may be able to do better here. * etc The techniques employed for work conservation in the presence of revocation should be configurable, and users should be able to achieve their own custom work conservation policies by implementing an allocator (or a subcomponent of the existing allocator). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5526) Allow schedulers to revoke resources to obtain their quota or fair share.
Benjamin Mahler created MESOS-5526: -- Summary: Allow schedulers to revoke resources to obtain their quota or fair share. Key: MESOS-5526 URL: https://issues.apache.org/jira/browse/MESOS-5526 Project: Mesos Issue Type: Epic Components: allocation Reporter: Benjamin Mahler In order to ensure fairness and quota guarantees are met in a dynamic cluster, we need to ensure that schedulers can revoke existing revocable allocations in order to obtain their fair share or their quota. Otherwise, schedulers must wait (potentially forever!) until existing allocations are freed. This is a policy that completely favors work conservation, in favor of meeting the fairness and quota guarantees in a bounded amount of time. As we expose resource constraints to schedulers (MESOS-5524), they will be able to determine when Mesos will allow them to revoke resources. For example: * If a scheduler is below its fair share, the scheduler may revoke existing revocable resources that are offered to it. * If a scheduler is below its quota, it can revoke existing revocable resources in order to consume it for quota in a non-revocable manner. This is orthogonal to optimistic or pessimistic allocation, in that either approaches need to allow the schedulers to perform revocation in this manner. In the pessimistic approach, we may confine what the scheduler can revoke, and in an optimistic approach, we may provide more choice to the scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5526) Allow schedulers to revoke resources to obtain their quota or fair share.
[ https://issues.apache.org/jira/browse/MESOS-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-5526: --- Component/s: framework api > Allow schedulers to revoke resources to obtain their quota or fair share. > - > > Key: MESOS-5526 > URL: https://issues.apache.org/jira/browse/MESOS-5526 > Project: Mesos > Issue Type: Epic > Components: allocation, framework api >Reporter: Benjamin Mahler > > In order to ensure fairness and quota guarantees are met in a dynamic > cluster, we need to ensure that schedulers can revoke existing revocable > allocations in order to obtain their fair share or their quota. Otherwise, > schedulers must wait (potentially forever!) until existing allocations are > freed. This is a policy that completely favors work conservation, in favor of > meeting the fairness and quota guarantees in a bounded amount of time. > As we expose resource constraints to schedulers (MESOS-5524), they will be > able to determine when Mesos will allow them to revoke resources. For example: > * If a scheduler is below its fair share, the scheduler may revoke existing > revocable resources that are offered to it. > * If a scheduler is below its quota, it can revoke existing revocable > resources in order to consume it for quota in a non-revocable manner. > This is orthogonal to optimistic or pessimistic allocation, in that either > approaches need to allow the schedulers to perform revocation in this manner. > In the pessimistic approach, we may confine what the scheduler can revoke, > and in an optimistic approach, we may provide more choice to the scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5525) Allow schedulers to decide whether to consume resources as revocable or non-revocable.
Benjamin Mahler created MESOS-5525: -- Summary: Allow schedulers to decide whether to consume resources as revocable or non-revocable. Key: MESOS-5525 URL: https://issues.apache.org/jira/browse/MESOS-5525 Project: Mesos Issue Type: Epic Components: framework api, allocation Reporter: Benjamin Mahler The idea here is that although some resources may only be consumed in a revocable manner (e.g. oversubscribed resources, resources from "spot instances", etc), other resources may be consumed in a non-revocable manner (e.g. dedicated instance, on-premise machine). However, a scheduler may wish to consume these non-revocable resources in a revocable manner. For example, if the scheduler has quota for non-revocable resources it may want not want to use its quota for a particular task and may wish to launch it in a revocable manner out of its fair share. See: In order to support this, we should adjust the meaning of revocable and non-revocable resources in order to allow schedulers to decide how to consume them. The scheduler could choose to consume non-revocable resources in a revocable manner in order to use its fair share of revocable resources rather than its quota. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5524) Expose resource consumption constraints (quota, shares) to schedulers.
Benjamin Mahler created MESOS-5524: -- Summary: Expose resource consumption constraints (quota, shares) to schedulers. Key: MESOS-5524 URL: https://issues.apache.org/jira/browse/MESOS-5524 Project: Mesos Issue Type: Epic Components: scheduler api, allocation Reporter: Benjamin Mahler Currently, schedulers do not have visibility into their quota or shares of the cluster. By providing this information, we give the scheduler the ability to make better decisions. As we start to allow schedulers to decide how they'd like to use a particular resource (e.g. as non-revocable or revocable), schedulers need visibility into their quota and shares to make an effective decision (otherwise they may accidentally exceed their quota and will not find out until mesos replies with TASK_LOST REASON_QUOTA_EXCEEDED). We would start by exposing the following information: * quota: e.g. cpus:10, mem:20, disk:40 * shares: e.g. cpus:20, mem:40, disk:80 Currently, quota is used for non-revocable resources and the idea is to use shares only for consuming revocable resources since the number of shares available to a role changes dynamically as resources come and go, frameworks come and go, or the operator manipulates the amount of resources sectioned off for quota. By exposing quota and shares, the framework knows when it can consume additional non-revocable resources (i.e. when it has fewer non-revocable resources allocated to it than its quota) or when it can consume revocable resources (always! but in the future, it cannot revoke another user's revocable resources if the framework is above its fair share). This also allows schedulers to determine whether they have sufficient quota assigned to them, and to alert the operator if they need more to run safely. Also, by viewing their fair share, the framework can expose monitoring information that shows the discrepancy between how much it would like and its fair share (note that the framework can actually exceed its fair share but in the future this will mean increased potential for revocation). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4642) Mesos Agent Json API can dump binary data from log files out as invalid JSON
[ https://issues.apache.org/jira/browse/MESOS-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308445#comment-15308445 ] Vinod Kone commented on MESOS-4642: --- Sounds like a plan Chris. Do you want to send a PR or review for the doc change? Here is the code https://github.com/apache/mesos/blob/master/src/files/files.cpp#L399 > Mesos Agent Json API can dump binary data from log files out as invalid JSON > > > Key: MESOS-4642 > URL: https://issues.apache.org/jira/browse/MESOS-4642 > Project: Mesos > Issue Type: Bug > Components: json api, slave >Affects Versions: 0.27.0 >Reporter: Steven Schlansker >Priority: Critical > > One of our tasks accidentally started logging binary data to stderr. This > was not intentional and generally should not happen -- however, it causes > severe problems with the Mesos Agent "files/read.json" API, since it gladly > dumps this binary data out as invalid JSON. > {code} > # hexdump -C /path/to/task/stderr | tail > 0003d1f0 6f 6e 6e 65 63 74 69 6f 6e 0a 4e 45 54 3a 20 31 |onnection.NET: 1| > 0003d200 20 6f 6e 72 65 61 64 20 45 4e 4f 45 4e 54 20 32 | onread ENOENT 2| > 0003d210 39 35 34 35 36 20 32 35 31 20 32 39 35 37 30 37 |95456 251 295707| > 0003d220 0a 01 00 00 00 00 00 00 ac 57 65 64 2c 20 31 30 |.Wed, 10| > 0003d230 20 55 6e 72 65 63 6f 67 6e 69 7a 65 64 20 69 6e | Unrecognized in| > 0003d240 70 75 74 20 68 65 61 64 65 72 0a |put header.| > {code} > {code} > # curl > 'http://agent-host:5051/files/read.json?path=/path/to/task/stderr=220443=9=' > | hexdump -C > 7970 6e 65 63 74 69 6f 6e 5c 6e 4e 45 54 3a 20 31 20 |nection\nNET: 1 | > 7980 6f 6e 72 65 61 64 20 45 4e 4f 45 4e 54 20 32 39 |onread ENOENT 29| > 7990 35 34 35 36 20 32 35 31 20 32 39 35 37 30 37 5c |5456 251 295707\| > 79a0 6e 5c 75 30 30 30 31 5c 75 30 30 30 30 5c 75 30 |n\u0001\u\u0| > 79b0 30 30 30 5c 75 30 30 30 30 5c 75 30 30 30 30 5c |000\u\u\| > 79c0 75 30 30 30 30 5c 75 30 30 30 30 ac 57 65 64 2c |u\u.Wed,| > 79d0 20 31 30 20 55 6e 72 65 63 6f 67 6e 69 7a 65 64 | 10 Unrecognized| > 79e0 20 69 6e 70 75 74 20 68 65 61 64 65 72 5c 6e 22 | input header\n"| > 79f0 2c 22 6f 66 66 73 65 74 22 3a 32 32 30 34 34 33 |,"offset":220443| > 7a00 7d|}| > {code} > This causes downstream sadness: > {code} > ERROR [2016-02-10 18:55:12,303] > io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request: > 0ee749630f8b26f1 > ! com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xac > ! at [Source: org.jboss.netty.buffer.ChannelBufferInputStream@6d69ee8; line: > 1, column: 31181] > ! at > com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1487) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidInitial(UTF8StreamJsonParser.java:3339) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidChar(UTF8StreamJsonParser.java:) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2360) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString(UTF8StreamJsonParser.java:2287) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:286) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:29) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:12) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:523) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:381) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1073) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.module.afterburner.deser.SuperSonicBeanDeserializer.deserializeFromObject(SuperSonicBeanDeserializer.java:196) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:142) >
[jira] [Commented] (MESOS-5457) Create a small testing doc for the v1 Scheduler/Executor API
[ https://issues.apache.org/jira/browse/MESOS-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308431#comment-15308431 ] Vinod Kone commented on MESOS-5457: --- Some of the improvements that were done as part of these tests. commit bf7162205b53114eb7367fa322951d573cbb716d Author: Anand MazumdarDate: Tue May 31 13:28:56 2016 -0600 Added move semantics to `Future::set`. Review: https://reviews.apache.org/r/47989/ commit 6ce7279b2399a02f524692ff5799d637b99b38ff Author: Anand Mazumdar Date: Tue May 31 13:28:51 2016 -0600 Added move constructor/assignment to `Try`. Review: https://reviews.apache.org/r/47988/ commit ae53e3b9980465119cd073620c02baf6e52d5695 Author: Anand Mazumdar Date: Tue May 31 13:28:47 2016 -0600 Constrained constructible types constructor for `Result`. This ensures that `Result` can only be created from constructible types. This logic is similar to the one already present in `Option`. Somehow, this constaint was never added for `Result`. Review: https://reviews.apache.org/r/47987/ commit a6b3d1ad6f4e1b83b48ac58ba247a422fac32101 Author: Anand Mazumdar Date: Tue May 31 13:28:43 2016 -0600 Added move constructor/assignment operator to `Result`. Added move constructor/assignment operator to \`Result\`. Note that `Some` still makes a copy and would be fixed in a separate patch. Review: https://reviews.apache.org/r/47986/ > Create a small testing doc for the v1 Scheduler/Executor API > > > Key: MESOS-5457 > URL: https://issues.apache.org/jira/browse/MESOS-5457 > Project: Mesos > Issue Type: Improvement >Reporter: Anand Mazumdar >Assignee: Jay Guo > Labels: mesosphere > Fix For: 1.0.0 > > > This is a follow up JIRA based on the comments from MESOS-3302 around testing > the v1 Scheduler/Executor API. I created a small document that has the > details of the manual testing done by me. The intent of this issue is to > track all the details on this ticket rather then on the epic. > Link to the doc: > https://docs.google.com/document/d/1Z8_8pn-x-VYInm12_En-1oP-FxkLzpG8EgC1qQ0eDRY/edit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5523) ValueError: A 0.7-series setuptools cannot be installed with distribute. Found one at /usr/lib/python2.7/dist-packages
Vinson Lee created MESOS-5523: - Summary: ValueError: A 0.7-series setuptools cannot be installed with distribute. Found one at /usr/lib/python2.7/dist-packages Key: MESOS-5523 URL: https://issues.apache.org/jira/browse/MESOS-5523 Project: Mesos Issue Type: Bug Components: build Environment: Ubuntu 16.04 Reporter: Vinson Lee {noformat} $ make [...] Building protobuf Python egg ... cd ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/python && \ CC="gcc" \ CXX="g++" \ CFLAGS="-g1 -O0 -Wno-unused-local-typedefs" \ CXXFLAGS="-g1 -O0 -Wno-unused-local-typedefs -std=c++11" \ PYTHONPATH=build/3rdparty/distribute-0.6.26 \ /usr/bin/python setup.py build bdist_egg Traceback (most recent call last): File "setup.py", line 11, in from setuptools import setup, Extension File "build/3rdparty/distribute-0.6.26/setuptools/__init__.py", line 2, in from setuptools.extension import Extension, Library File "build/3rdparty/distribute-0.6.26/setuptools/extension.py", line 5, in from setuptools.dist import _get_unpatched File "build/3rdparty/distribute-0.6.26/setuptools/dist.py", line 6, in from setuptools.command.install import install File "build/3rdparty/distribute-0.6.26/setuptools/command/__init__.py", line 8, in from setuptools.command import install_scripts File "build/3rdparty/distribute-0.6.26/setuptools/command/install_scripts.py", line 3, in from pkg_resources import Distribution, PathMetadata, ensure_directory File "build/3rdparty/distribute-0.6.26/pkg_resources.py", line 2731, in add_activation_listener(lambda dist: dist.activate()) File "build/3rdparty/distribute-0.6.26/pkg_resources.py", line 704, in subscribe callback(dist) File "build/3rdparty/distribute-0.6.26/pkg_resources.py", line 2731, in add_activation_listener(lambda dist: dist.activate()) File "build/3rdparty/distribute-0.6.26/pkg_resources.py", line 2231, in activate self.insert_on(path) File "build/3rdparty/distribute-0.6.26/pkg_resources.py", line 2332, in insert_on "with distribute. Found one at %s" % str(self.location)) ValueError: A 0.7-series setuptools cannot be installed with distribute. Found one at /usr/lib/python2.7/dist-packages Makefile:10277: recipe for target '../3rdparty/libprocess/3rdparty/protobuf-2.5.0/python/dist/protobuf-2.5.0-py2.7.egg' failed make[2]: *** [../3rdparty/libprocess/3rdparty/protobuf-2.5.0/python/dist/protobuf-2.5.0-py2.7.egg] Error 1 make[2]: Leaving directory 'build/src' Makefile:2805: recipe for target 'all' failed make[1]: *** [all] Error 2 make[1]: Leaving directory 'build/src' Makefile:731: recipe for target 'all-recursive' failed make: *** [all-recursive] Error 1 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5503) Implement GET_MAINTENANCE_STATUS Call in v1 master API.
[ https://issues.apache.org/jira/browse/MESOS-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-5503: --- Assignee: haosdent > Implement GET_MAINTENANCE_STATUS Call in v1 master API. > --- > > Key: MESOS-5503 > URL: https://issues.apache.org/jira/browse/MESOS-5503 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: haosdent > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4642) Mesos Agent Json API can dump binary data from log files out as invalid JSON
[ https://issues.apache.org/jira/browse/MESOS-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308222#comment-15308222 ] Chris Pennello commented on MESOS-4642: --- At the very least, if we aren't going to modify how the code works, then at least document that {{files/read.json}} is _not_ guaranteed to return valid JSON. For example, a consequential, and not _too_ unreasonable workaround, is to ensure that your files are themselves UTF-8 encoded, and that would be a helpful thing to mention in the endpoint documentation. > Mesos Agent Json API can dump binary data from log files out as invalid JSON > > > Key: MESOS-4642 > URL: https://issues.apache.org/jira/browse/MESOS-4642 > Project: Mesos > Issue Type: Bug > Components: json api, slave >Affects Versions: 0.27.0 >Reporter: Steven Schlansker >Priority: Critical > > One of our tasks accidentally started logging binary data to stderr. This > was not intentional and generally should not happen -- however, it causes > severe problems with the Mesos Agent "files/read.json" API, since it gladly > dumps this binary data out as invalid JSON. > {code} > # hexdump -C /path/to/task/stderr | tail > 0003d1f0 6f 6e 6e 65 63 74 69 6f 6e 0a 4e 45 54 3a 20 31 |onnection.NET: 1| > 0003d200 20 6f 6e 72 65 61 64 20 45 4e 4f 45 4e 54 20 32 | onread ENOENT 2| > 0003d210 39 35 34 35 36 20 32 35 31 20 32 39 35 37 30 37 |95456 251 295707| > 0003d220 0a 01 00 00 00 00 00 00 ac 57 65 64 2c 20 31 30 |.Wed, 10| > 0003d230 20 55 6e 72 65 63 6f 67 6e 69 7a 65 64 20 69 6e | Unrecognized in| > 0003d240 70 75 74 20 68 65 61 64 65 72 0a |put header.| > {code} > {code} > # curl > 'http://agent-host:5051/files/read.json?path=/path/to/task/stderr=220443=9=' > | hexdump -C > 7970 6e 65 63 74 69 6f 6e 5c 6e 4e 45 54 3a 20 31 20 |nection\nNET: 1 | > 7980 6f 6e 72 65 61 64 20 45 4e 4f 45 4e 54 20 32 39 |onread ENOENT 29| > 7990 35 34 35 36 20 32 35 31 20 32 39 35 37 30 37 5c |5456 251 295707\| > 79a0 6e 5c 75 30 30 30 31 5c 75 30 30 30 30 5c 75 30 |n\u0001\u\u0| > 79b0 30 30 30 5c 75 30 30 30 30 5c 75 30 30 30 30 5c |000\u\u\| > 79c0 75 30 30 30 30 5c 75 30 30 30 30 ac 57 65 64 2c |u\u.Wed,| > 79d0 20 31 30 20 55 6e 72 65 63 6f 67 6e 69 7a 65 64 | 10 Unrecognized| > 79e0 20 69 6e 70 75 74 20 68 65 61 64 65 72 5c 6e 22 | input header\n"| > 79f0 2c 22 6f 66 66 73 65 74 22 3a 32 32 30 34 34 33 |,"offset":220443| > 7a00 7d|}| > {code} > This causes downstream sadness: > {code} > ERROR [2016-02-10 18:55:12,303] > io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request: > 0ee749630f8b26f1 > ! com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xac > ! at [Source: org.jboss.netty.buffer.ChannelBufferInputStream@6d69ee8; line: > 1, column: 31181] > ! at > com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1487) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidInitial(UTF8StreamJsonParser.java:3339) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidChar(UTF8StreamJsonParser.java:) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2360) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString(UTF8StreamJsonParser.java:2287) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:286) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:29) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:12) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:523) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:381) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1073) > ~[singularity-0.4.9.jar:0.4.9] > ! at >
[jira] [Commented] (MESOS-5339) Create Tests for testing fine-grained HTTP endpoint filtering.
[ https://issues.apache.org/jira/browse/MESOS-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308030#comment-15308030 ] Michael Park commented on MESOS-5339: - [~adam-mesos] We are planning to commit this today. The patch is at https://reviews.apache.org/r/48054/. > Create Tests for testing fine-grained HTTP endpoint filtering. > -- > > Key: MESOS-5339 > URL: https://issues.apache.org/jira/browse/MESOS-5339 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad >Assignee: Joerg Schad > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5040) Add cgroups_subsystems flag for cgroups unified isolator
[ https://issues.apache.org/jira/browse/MESOS-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307851#comment-15307851 ] haosdent commented on MESOS-5040: - Hi, [~qianzhang] Thank you for your reply. Please follow this chain. The above one is discared. > Add cgroups_subsystems flag for cgroups unified isolator > > > Key: MESOS-5040 > URL: https://issues.apache.org/jira/browse/MESOS-5040 > Project: Mesos > Issue Type: Task > Components: cgroups, isolation >Reporter: haosdent >Assignee: haosdent > > In past, we specify the cgroups subsystems we used in Mesos containerizer in > {{--isolation}} flag. In cgroups unified isolator, we need to add this > separate flag to control which subsystems we enable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5040) Add cgroups_subsystems flag for cgroups unified isolator
[ https://issues.apache.org/jira/browse/MESOS-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307841#comment-15307841 ] Qian Zhang commented on MESOS-5040: --- [~haosd...@gmail.com], is the above patch reviewable now? If yes, can you please make it public? :-) > Add cgroups_subsystems flag for cgroups unified isolator > > > Key: MESOS-5040 > URL: https://issues.apache.org/jira/browse/MESOS-5040 > Project: Mesos > Issue Type: Task > Components: cgroups, isolation >Reporter: haosdent >Assignee: haosdent > > In past, we specify the cgroups subsystems we used in Mesos containerizer in > {{--isolation}} flag. In cgroups unified isolator, we need to add this > separate flag to control which subsystems we enable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5339) Create Tests for testing fine-grained HTTP endpoint filtering.
[ https://issues.apache.org/jira/browse/MESOS-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307568#comment-15307568 ] Adam B commented on MESOS-5339: --- [~js84], [~mcypark], are you guys planning to add tests for 0.29/1.0? If not, let's remove this from the parent Epic MESOS-4931 and close out the Epic now that the rest of its tasks are resolved. > Create Tests for testing fine-grained HTTP endpoint filtering. > -- > > Key: MESOS-5339 > URL: https://issues.apache.org/jira/browse/MESOS-5339 > Project: Mesos > Issue Type: Improvement >Reporter: Joerg Schad >Assignee: Joerg Schad > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4772) TaskInfo/ExecutorInfo should include fine-grained ownership/namespacing
[ https://issues.apache.org/jira/browse/MESOS-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-4772: -- Assignee: (was: Jan Schlicht) > TaskInfo/ExecutorInfo should include fine-grained ownership/namespacing > --- > > Key: MESOS-4772 > URL: https://issues.apache.org/jira/browse/MESOS-4772 > Project: Mesos > Issue Type: Improvement > Components: security >Reporter: Adam B > Labels: authorization, mesosphere, ownership, security > > We need a way to assign fine-grained ownership to tasks/executors so that > multi-user frameworks can tell Mesos to associate the task with a user > identity (rather than just the framework principal+role). Then, when an HTTP > user requests to view the task's sandbox contents, or kill the task, or list > all tasks, the authorizer can determine whether to allow/deny/filter the > request based on finer-grained, user-level ownership. > Some systems may want TaskInfo.owner to represent a group rather than an > individual user. That's fine as long as the framework sets the field to the > group ID in such a way that a group-aware authorizer can interpret it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2717) Qemu/KVM containerizer
[ https://issues.apache.org/jira/browse/MESOS-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307274#comment-15307274 ] Jie Yu commented on MESOS-2717: --- Yes, i don't disagree. But if you take a look at the existing docker containerizer, many of the logics are shared with Mesos containerizer. I don't want us to introduce yet another containerizer and copy the core logics yet again. Also, writing a contianerizer is highly non trivial, and hard to get it right. > Qemu/KVM containerizer > -- > > Key: MESOS-2717 > URL: https://issues.apache.org/jira/browse/MESOS-2717 > Project: Mesos > Issue Type: Wish > Components: containerization >Reporter: Pierre-Yves Ritschard >Assignee: Abhishek Dasgupta > > I think it would make sense for Mesos to have the ability to treat > hypervisors as containerizers and the most sensible one to start with would > probably be Qemu/KVM. > There are a few workloads that can require full-fledged VMs (the most obvious > one being Windows workloads). > The containerization code is well decoupled and seems simple enough, I can > definitely take a shot at it. VMs do bring some questions with them here is > my take on them: > 1. Routing, network strategy > == > The simplest approach here might very well be to go for bridged networks > and leave the setup and inter slave routing up to the administrator > 2. IP Address assignment > > At first, it can be up to the Frameworks to deal with IP assignment. > The simplest way to address this could be to have an executor running > on slaves providing the qemu/kvm containerizer which would instrument a DHCP > server and collect IP + Mac address resources from slaves. While it may be up > to the frameworks to provide this, an example should most likely be provided. > 3. VM Templates > == > VM templates should probably leverage the fetcher and could thus be copied > locally or fetch from HTTP(s) / HDFS. > 4. Resource limiting > > Mapping resouce constraints to the qemu command line is probably the easiest > part, Additional command line should also be fetchable. For Unix VMs, the > sandbox could show the output of the serial console > 5. Libvirt / plain Qemu > = > I tend to favor limiting the amount of necessary hoops to jump through and > would thus investigate working directly with Qemu, maintaining an open > connection to the monitor to assert status. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2717) Qemu/KVM containerizer
[ https://issues.apache.org/jira/browse/MESOS-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307270#comment-15307270 ] Angus Lees commented on MESOS-2717: --- > Every time we introduce a new feature (e.g., persistent volume, gpu, etc.), > we need to provide two implementations for both containerizers. You're going to need to implement those features again for a VM-based "containerizer" anyway. It is highly unlikely that you could share any implementation regardless of where the code actually lived... > Qemu/KVM containerizer > -- > > Key: MESOS-2717 > URL: https://issues.apache.org/jira/browse/MESOS-2717 > Project: Mesos > Issue Type: Wish > Components: containerization >Reporter: Pierre-Yves Ritschard >Assignee: Abhishek Dasgupta > > I think it would make sense for Mesos to have the ability to treat > hypervisors as containerizers and the most sensible one to start with would > probably be Qemu/KVM. > There are a few workloads that can require full-fledged VMs (the most obvious > one being Windows workloads). > The containerization code is well decoupled and seems simple enough, I can > definitely take a shot at it. VMs do bring some questions with them here is > my take on them: > 1. Routing, network strategy > == > The simplest approach here might very well be to go for bridged networks > and leave the setup and inter slave routing up to the administrator > 2. IP Address assignment > > At first, it can be up to the Frameworks to deal with IP assignment. > The simplest way to address this could be to have an executor running > on slaves providing the qemu/kvm containerizer which would instrument a DHCP > server and collect IP + Mac address resources from slaves. While it may be up > to the frameworks to provide this, an example should most likely be provided. > 3. VM Templates > == > VM templates should probably leverage the fetcher and could thus be copied > locally or fetch from HTTP(s) / HDFS. > 4. Resource limiting > > Mapping resouce constraints to the qemu command line is probably the easiest > part, Additional command line should also be fetchable. For Unix VMs, the > sandbox could show the output of the serial console > 5. Libvirt / plain Qemu > = > I tend to favor limiting the amount of necessary hoops to jump through and > would thus investigate working directly with Qemu, maintaining an open > connection to the monitor to assert status. -- This message was sent by Atlassian JIRA (v6.3.4#6332)