[jira] [Created] (MESOS-9486) Set up `object.value` for `CREATE_DISK` and `DESTROY_DISK` authorizations.
Chun-Hung Hsiao created MESOS-9486: -- Summary: Set up `object.value` for `CREATE_DISK` and `DESTROY_DISK` authorizations. Key: MESOS-9486 URL: https://issues.apache.org/jira/browse/MESOS-9486 Project: Mesos Issue Type: Improvement Components: master Reporter: Chun-Hung Hsiao Assignee: Chun-Hung Hsiao We should be defensive and set up {{object.value}} to the role of the resource for authorization actions {{CREATE_BLOCK_DISK}}, {{DESTROY_BLOCK_DISK}}, {{CREATE_MOUNT_DISK}} and {{DESTROY_MOUNT_DISK}} so an old-school authorizer can rely on the field to perform authorization. This behavior is deprecated though, so will be removed once all `*_WITH_ROLE` authorization action aliases are removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9485) Unit test for master operation authorization.
Chun-Hung Hsiao created MESOS-9485: -- Summary: Unit test for master operation authorization. Key: MESOS-9485 URL: https://issues.apache.org/jira/browse/MESOS-9485 Project: Mesos Issue Type: Task Components: test Reporter: Chun-Hung Hsiao Assignee: Chun-Hung Hsiao We should create a unit test for MESOS-9474 and MESOS-9480. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9482) Resource provider manager can crash on invalid data from resource providers
[ https://issues.apache.org/jira/browse/MESOS-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723277#comment-16723277 ] Chun-Hung Hsiao commented on MESOS-9482: Actually I already have MESOS-9407 created a while ago. Closing that one as a duplicate of this one since this describes the problem more generally. > Resource provider manager can crash on invalid data from resource providers > --- > > Key: MESOS-9482 > URL: https://issues.apache.org/jira/browse/MESOS-9482 > Project: Mesos > Issue Type: Bug >Reporter: Benjamin Bannier >Priority: Major > > The resource provider manager code currently contains a number of assertions > which will crash the manager (and its agent) if some forms of invalid data > are received from a resource provider. This is dangerous since resource > providers are not necessarily part of Mesos-controlled code (they talk to the > manager over an HTTP API and could even be in external processes). > Instead of crashing, the resource provider manager should disconnect the > resource providers in such scenarios. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9459) Reviewbot is not verifying reviews that need verification
[ https://issues.apache.org/jira/browse/MESOS-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722957#comment-16722957 ] Till Toenshoff commented on MESOS-9459: --- https://reviews.apache.org/r/69559 > Reviewbot is not verifying reviews that need verification > - > > Key: MESOS-9459 > URL: https://issues.apache.org/jira/browse/MESOS-9459 > Project: Mesos > Issue Type: Bug >Reporter: Vinod Kone >Assignee: Armand Grillet >Priority: Major > Labels: ci, integration > > For example this run of ReviewBot > https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Reviewbot/23594/console > says that there are no reviews to be verified, which is false because if we > look at ReviewBoard there are a bunch of reviews that have not been commented > on by ReviewBot since a new diff has been posted. > {noformat} > 12-05-18_23:41:54 - Running > /home/jenkins/jenkins-slave/workspace/Mesos-Reviewbot/support/verify-reviews.py > 0 review requests need verification > {noformat} > I see the the logic of the verify-reviews.py script was changed as part of > the python3 transition here: https://reviews.apache.org/r/68619/diff/1#27 > which likely caused the bug. > As an aside, It's unfortunate that python3 update was bundled with logic > changes in this review. cc [~andschwa] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9484) GroupTest.GroupDataWithDisconnect is flaky
Benno Evers created MESOS-9484: -- Summary: GroupTest.GroupDataWithDisconnect is flaky Key: MESOS-9484 URL: https://issues.apache.org/jira/browse/MESOS-9484 Project: Mesos Issue Type: Bug Environment: Mac OSX w/ libevent Reporter: Benno Evers Observed the following error in our CI: {noformat} ../../src/tests/group_tests.cpp:129: Failure data.get() is NONE {noformat} Full log: {noformat} [ RUN ] GroupTest.GroupDataWithDisconnect I1214 15:06:53.386937 398710208 zookeeper_test_server.cpp:156] Started ZooKeeperTestServer on port 51193 2018-12-14 15:06:53,387:69505(0x739ee000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8 2018-12-14 15:06:53,387:69505(0x739ee000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local 2018-12-14 15:06:53,387:69505(0x739ee000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin 2018-12-14 15:06:53,387:69505(0x739ee000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0 2018-12-14 15:06:53,387:69505(0x739ee000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 2018-12-14 15:06:53,387:69505(0x739ee000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins 2018-12-14 15:06:53,387:69505(0x739ee000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins 2018-12-14 15:06:53,387:69505(0x739ee000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build 2018-12-14 15:06:53,387:69505(0x739ee000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:51193 sessionTimeout=1 watcher=0x11a65f9a0 sessionId=0 sessionPasswd= context=0x7fcd06163550 flags=0 2018-12-14 15:06:53,387:69505(0x74415000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:51193] 2018-12-14 15:06:53,389:69505(0x74415000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:51193], sessionId=0x167aef9004a, negotiated timeout=1 I1214 15:06:53.389168 60743680 group.cpp:341] Group process (zookeeper-group(40)@10.0.49.4:49309) connected to ZooKeeper I1214 15:06:53.389210 60743680 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0) I1214 15:06:53.389227 60743680 group.cpp:419] Trying to create path '/test' in ZooKeeper I1214 15:06:53.392253 398710208 zookeeper_test_server.cpp:116] Shutting down ZooKeeperTestServer on port 51193 2018-12-14 15:06:53,393:69505(0x74415000):ZOO_ERROR@handle_socket_error_msg@1782: Socket [127.0.0.1:51193] zk retcode=-4, errno=64(Host is down): failed while receiving a server response I1214 15:06:53.393187 59133952 group.cpp:452] Lost connection to ZooKeeper, attempting to reconnect ... I1214 15:06:53.393661 59670528 group.cpp:700] Trying to get '/test/00' in ZooKeeper 2018-12-14 15:06:53,393:69505(0x74415000):ZOO_ERROR@handle_socket_error_msg@1758: Socket [127.0.0.1:51193] zk retcode=-4, errno=61(Connection refused): server refused to accept the client I1214 15:06:53.395321 398710208 zookeeper_test_server.cpp:156] Started ZooKeeperTestServer on port 51193 W1214 15:07:04.003191 59670528 group.cpp:495] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=167aef9004a) expiration I1214 15:07:04.003652 59670528 group.cpp:511] ZooKeeper session expired 2018-12-14 15:07:04,004:69505(0x738e8000):ZOO_INFO@zookeeper_close@2579: Freeing zookeeper resources for sessionId=0x167aef9004a 2018-12-14 15:07:04,004:69505(0x739ee000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8 2018-12-14 15:07:04,004:69505(0x739ee000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local 2018-12-14 15:07:04,004:69505(0x739ee000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin 2018-12-14 15:07:04,004:69505(0x739ee000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0 2018-12-14 15:07:04,004:69505(0x739ee000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 2018-12-14 15:07:04,004:69505(0x739ee000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins 2018-12-14 15:07:04,004:69505(0x739ee000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins 2018-12-14 15:07:04,004:69505(0x739ee000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build 2018-12-14 15:07:04,004:69505(0x739ee000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:51193 sessionTimeout=1
[jira] [Created] (MESOS-9483) ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors is flaky
Benno Evers created MESOS-9483: -- Summary: ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors is flaky Key: MESOS-9483 URL: https://issues.apache.org/jira/browse/MESOS-9483 Project: Mesos Issue Type: Bug Environment: Mac OSX w/ libevent Reporter: Benno Evers Observed a failure with the following error: {noformat} ../../src/tests/master_contender_detector_tests.cpp:409: Failure Failed to wait 15secs for group1.join("data") {noformat} Full log: {noformat} [ RUN ] ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors I1214 15:03:56.036525 398710208 zookeeper_test_server.cpp:156] Started ZooKeeperTestServer on port 50199 2018-12-14 15:03:56,036:69505(0x7396b000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8 2018-12-14 15:03:56,036:69505(0x7396b000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local 2018-12-14 15:03:56,036:69505(0x7396b000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin 2018-12-14 15:03:56,036:69505(0x7396b000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0 2018-12-14 15:03:56,036:69505(0x7396b000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 2018-12-14 15:03:56,036:69505(0x7396b000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins 2018-12-14 15:03:56,036:69505(0x7396b000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins 2018-12-14 15:03:56,036:69505(0x7396b000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build 2018-12-14 15:03:56,036:69505(0x7396b000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:50199 sessionTimeout=1 watcher=0x11a65f9a0 sessionId=0 sessionPasswd= context=0x7fcd061125a0 flags=0 2018-12-14 15:03:56,037:69505(0x74415000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:50199] 2018-12-14 15:03:56,039:69505(0x74415000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:50199], sessionId=0x167aef64b83, negotiated timeout=1 I1214 15:03:56.039242 60207104 group.cpp:341] Group process (zookeeper-group(14)@10.0.49.4:49309) connected to ZooKeeper I1214 15:03:56.039286 60207104 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0) I1214 15:03:56.039309 60207104 group.cpp:395] Authenticating with ZooKeeper using digest 2018-12-14 15:04:05,989:69505(0x74415000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 6619ms 2018-12-14 15:04:05,989:69505(0x74415000):ZOO_ERROR@handle_socket_error_msg@1702: Socket [127.0.0.1:50199] zk retcode=-7, errno=60(Operation timed out): connection to 127.0.0.1:50199 timed out (exceeded timeout by 3284ms) 2018-12-14 15:04:05,989:69505(0x74415000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 6619ms I1214 15:04:05.990031 60207104 group.cpp:452] Lost connection to ZooKeeper, attempting to reconnect ... 2018-12-14 15:04:09,332:69505(0x74415000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 9963ms 2018-12-14 15:04:09,332:69505(0x74415000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:50199] 2018-12-14 15:04:09,333:69505(0x74415000):ZOO_ERROR@handle_socket_error_msg@1800: Socket [127.0.0.1:50199] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x167aef64b83 has expired. I1214 15:04:09.333552 59670528 group.cpp:511] ZooKeeper session expired 2018-12-14 15:04:09,333:69505(0x738e8000):ZOO_INFO@zookeeper_close@2579: Freeing zookeeper resources for sessionId=0x167aef64b83 2018-12-14 15:04:09,333:69505(0x7375f000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8 2018-12-14 15:04:09,333:69505(0x7375f000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local 2018-12-14 15:04:09,333:69505(0x7375f000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin 2018-12-14 15:04:09,333:69505(0x7375f000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0 2018-12-14 15:04:09,333:69505(0x7375f000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 2018-12-14 15:04:09,333:69505(0x7375f000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins 2018-12-14 15:04:09,333:69505(0x7375f000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins 2018-12-14 15:04:09,333:69505(0x7375f000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build 2018-12-14
[jira] [Assigned] (MESOS-9480) Master may skip processing authorization results for `LAUNCH_GROUP`.
[ https://issues.apache.org/jira/browse/MESOS-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Schlicht reassigned MESOS-9480: --- Assignee: Chun-Hung Hsiao (was: Jan Schlicht) > Master may skip processing authorization results for `LAUNCH_GROUP`. > > > Key: MESOS-9480 > URL: https://issues.apache.org/jira/browse/MESOS-9480 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 1.5.0, 1.5.1, 1.6.0, 1.6.1, 1.7.0 >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Blocker > Labels: mesosphere > > If there is a validation error for {{LAUNCH_GROUP}}, or if there are multiple > authorization errors for some of the tasks in a {{LAUNCH_GROUP}}, the master > will skip processing the remaining authorization results, which would result > in these authorization results being examined by subsequent operations > incorrectly: > https://github.com/apache/mesos/blob/3ade731d0c1772206c4afdf56318cfab6356acee/src/master/master.cpp#L5487-L5521 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9482) Resource provider manager can crash on invalid data from resource providers
Benjamin Bannier created MESOS-9482: --- Summary: Resource provider manager can crash on invalid data from resource providers Key: MESOS-9482 URL: https://issues.apache.org/jira/browse/MESOS-9482 Project: Mesos Issue Type: Bug Reporter: Benjamin Bannier The resource provider manager code currently contains a number of assertions which will crash the manager (and its agent) if some forms of invalid data are received from a resource provider. This is dangerous since resource providers are not necessarily part of Mesos-controlled code (they talk to the manager over an HTTP API and could even be in external processes). Instead of crashing, the resource provider manager should disconnect the resource providers in such scenarios. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9481) Registration of frameworks with set but empty ID should not be allowed
Benjamin Bannier created MESOS-9481: --- Summary: Registration of frameworks with set but empty ID should not be allowed Key: MESOS-9481 URL: https://issues.apache.org/jira/browse/MESOS-9481 Project: Mesos Issue Type: Bug Components: master Reporter: Benjamin Bannier Mesos currently allows frameworks to register with a set, but empty ID. Internally this is treated identically to the case of registration without a set ID, and quite some code exists to support this. We should check whether we really need to provide this level of flexibility. It not only complicates the implementation, but also the API which both leads to conceptually harder to grasp code (which tends to be error prone). Ideally we should reject a set but empty {{FrameworkID}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)