[jira] [Created] (MESOS-4649) Speed up ZooKeeperTest.LeaderContender by advance Clock.
haosdent created MESOS-4649: --- Summary: Speed up ZooKeeperTest.LeaderContender by advance Clock. Key: MESOS-4649 URL: https://issues.apache.org/jira/browse/MESOS-4649 Project: Mesos Issue Type: Improvement Reporter: haosdent Assignee: haosdent Priority: Minor ZooKeeperTest.LeaderContender reconnect multiple times. We could use advance to avoid those reconnect timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4159) Speed up GroupTest.*
[ https://issues.apache.org/jira/browse/MESOS-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142471#comment-15142471 ] haosdent commented on MESOS-4159: - GroupTest.GroupPathWithRestrictivePerms depends [MESOS-4648|https://issues.apache.org/jira/browse/MESOS-4648] > Speed up GroupTest.* > > > Key: MESOS-4159 > URL: https://issues.apache.org/jira/browse/MESOS-4159 > Project: Mesos > Issue Type: Epic > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > Execution times on Mac OS 10.10.4: > {code} > GroupTest.GroupJoinWithDisconnect (3352 ms) > GroupTest.GroupDataWithDisconnect (3350 ms) > GroupTest.GroupCancelWithDisconnect (2013 ms) > GroupTest.GroupPathWithRestrictivePerms (13368 ms) > GroupTest.RetryableErrors (26720 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4649) Speed up ZooKeeperTest.LeaderContender by advance Clock.
[ https://issues.apache.org/jira/browse/MESOS-4649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142404#comment-15142404 ] haosdent edited comment on MESOS-4649 at 2/11/16 8:32 AM: -- Patch: https://reviews.apache.org/r/43472 was (Author: haosd...@gmail.com): Patch: https://issues.apache.org/jira/browse/MESOS-4649 > Speed up ZooKeeperTest.LeaderContender by advance Clock. > > > Key: MESOS-4649 > URL: https://issues.apache.org/jira/browse/MESOS-4649 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent >Assignee: haosdent >Priority: Minor > > ZooKeeperTest.LeaderContender reconnect multiple times. We could use advance > to avoid those reconnect timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4652) Speed up ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSessionNewMaster by advance Clock.
haosdent created MESOS-4652: --- Summary: Speed up ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSessionNewMaster by advance Clock. Key: MESOS-4652 URL: https://issues.apache.org/jira/browse/MESOS-4652 Project: Mesos Issue Type: Improvement Reporter: haosdent Assignee: haosdent Priority: Minor ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSession contains reconnect. We could use advance to avoid the expired timeout and speed up reconnecting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4651) Speed up ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSession by advance Clock.
[ https://issues.apache.org/jira/browse/MESOS-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-4651: Description: ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSession contains reconnect. We could use advance to avoid the expired timeout and speed up reconnecting. (was: ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork contains reconnect. We could use advance to avoid the expired timeout and speed up reconnecting.) > Speed up > ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSession by > advance Clock. > -- > > Key: MESOS-4651 > URL: https://issues.apache.org/jira/browse/MESOS-4651 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent >Assignee: haosdent >Priority: Minor > > ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSession > contains reconnect. We could use advance to avoid the expired timeout and > speed up reconnecting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4650) Speed up ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork by advance Clock.
[ https://issues.apache.org/jira/browse/MESOS-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-4650: Description: We could use advance to avoid the expired timeout and speed up reconnecting. (was: ZooKeeperTest.LeaderContender reconnect multiple times. We could use advance to avoid those reconnect timeout.) > Speed up > ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork by > advance Clock. > > > Key: MESOS-4650 > URL: https://issues.apache.org/jira/browse/MESOS-4650 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent >Assignee: haosdent >Priority: Minor > > We could use advance to avoid the expired timeout and speed up reconnecting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4648) Backport zookeeper slow add_auth patch
[ https://issues.apache.org/jira/browse/MESOS-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142401#comment-15142401 ] haosdent commented on MESOS-4648: - This issue pending for the patch of [ZOOKEEPER-770|https://issues.apache.org/jira/browse/ZOOKEEPER-770] merged to upstream. > Backport zookeeper slow add_auth patch > -- > > Key: MESOS-4648 > URL: https://issues.apache.org/jira/browse/MESOS-4648 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent >Assignee: haosdent >Priority: Minor > Labels: test, zookeeper > > Backport [ZOOKEEPER-770 Slow add_auth calls with multi-threaded > client|https://issues.apache.org/jira/browse/ZOOKEEPER-770] to solve c client > slow add_auth call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4159) Speed up GroupTest.*
[ https://issues.apache.org/jira/browse/MESOS-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142468#comment-15142468 ] haosdent commented on MESOS-4159: - After apply the patch {code} [ OK ] GroupTest.GroupJoinWithDisconnect (405 ms) [ OK ] GroupTest.GroupDataWithDisconnect (192 ms) [ OK ] GroupTest.GroupCancelWithDisconnect (250 ms) [ OK ] GroupTest.GroupPathWithRestrictivePerms (334 ms) [ OK ] GroupTest.RetryableErrors (341 ms) {code} > Speed up GroupTest.* > > > Key: MESOS-4159 > URL: https://issues.apache.org/jira/browse/MESOS-4159 > Project: Mesos > Issue Type: Epic > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > Execution times on Mac OS 10.10.4: > {code} > GroupTest.GroupJoinWithDisconnect (3352 ms) > GroupTest.GroupDataWithDisconnect (3350 ms) > GroupTest.GroupCancelWithDisconnect (2013 ms) > GroupTest.GroupPathWithRestrictivePerms (13368 ms) > GroupTest.RetryableErrors (26720 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4650) Speed up ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork by advance Clock.
[ https://issues.apache.org/jira/browse/MESOS-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-4650: Description: ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork contains reconnect. We could use advance to avoid the expired timeout and speed up reconnecting. (was: We could use advance to avoid the expired timeout and speed up reconnecting.) > Speed up > ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork by > advance Clock. > > > Key: MESOS-4650 > URL: https://issues.apache.org/jira/browse/MESOS-4650 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent >Assignee: haosdent >Priority: Minor > > ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork > contains reconnect. We could use advance to avoid the expired timeout and > speed up reconnecting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4651) Speed up ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSession by advance Clock.
haosdent created MESOS-4651: --- Summary: Speed up ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSession by advance Clock. Key: MESOS-4651 URL: https://issues.apache.org/jira/browse/MESOS-4651 Project: Mesos Issue Type: Improvement Reporter: haosdent Assignee: haosdent Priority: Minor ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork contains reconnect. We could use advance to avoid the expired timeout and speed up reconnecting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4650) Speed up ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork by advance Clock.
haosdent created MESOS-4650: --- Summary: Speed up ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork by advance Clock. Key: MESOS-4650 URL: https://issues.apache.org/jira/browse/MESOS-4650 Project: Mesos Issue Type: Improvement Reporter: haosdent Assignee: haosdent Priority: Minor ZooKeeperTest.LeaderContender reconnect multiple times. We could use advance to avoid those reconnect timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4633) Tests will dereference stack allocated agent objects upon assertion/expectation failure.
[ https://issues.apache.org/jira/browse/MESOS-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142429#comment-15142429 ] Michael Park commented on MESOS-4633: - {noformat} commit 28917d657a69ee4731375ba54fe67d13816198ff Author: Joseph WuDate: Wed Feb 10 16:56:35 2016 -0800 Constrain `Option`'s forwarding constructor to constructible types. Review: https://reviews.apache.org/r/43434/ {noformat} > Tests will dereference stack allocated agent objects upon > assertion/expectation failure. > > > Key: MESOS-4633 > URL: https://issues.apache.org/jira/browse/MESOS-4633 > Project: Mesos > Issue Type: Bug >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: flaky, mesosphere, tech-debt, test > > Tests that use the {{StartSlave}} test helper are generally fragile when the > test fails an assert/expect in the middle of the test. This is because the > {{StartSlave}} helper takes raw pointer arguments, which may be > stack-allocated. > In case of an assert failure, the test immediately exits (destroying stack > allocated objects) and proceeds onto test cleanup. The test cleanup may > dereference some of these destroyed objects, leading to a test crash like: > {code} > [18:27:36][Step 8/8] F0204 18:27:35.981302 23085 logging.cpp:64] RAW: Pure > virtual method called > [18:27:36][Step 8/8] @ 0x7f7077055e1c google::LogMessage::Fail() > [18:27:36][Step 8/8] @ 0x7f707705ba6f google::RawLog__() > [18:27:36][Step 8/8] @ 0x7f70760f76c9 __cxa_pure_virtual > [18:27:36][Step 8/8] @ 0xa9423c > mesos::internal::tests::Cluster::Slaves::shutdown() > [18:27:36][Step 8/8] @ 0x1074e45 > mesos::internal::tests::MesosTest::ShutdownSlaves() > [18:27:36][Step 8/8] @ 0x1074de4 > mesos::internal::tests::MesosTest::Shutdown() > [18:27:36][Step 8/8] @ 0x1070ec7 > mesos::internal::tests::MesosTest::TearDown() > {code} > The {{StartSlave}} helper should take {{shared_ptr}} arguments instead. > This also means that we can remove the {{Shutdown}} helper from most of these > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4164) MasterTest.RecoverResources is slow
[ https://issues.apache.org/jira/browse/MESOS-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144121#comment-15144121 ] haosdent commented on MESOS-4164: - After: {code} [ OK ] MasterTest.RecoverResources (113 ms) {code} > MasterTest.RecoverResources is slow > --- > > Key: MESOS-4164 > URL: https://issues.apache.org/jira/browse/MESOS-4164 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{MasterTest.RecoverResources}} test takes more than {{1s}} to finish on > my Mac OS 10.10.4: > {code} > MasterTest.RecoverResources (1018 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4168) MasterMaintenanceTest.EnterMaintenanceMode is slow
[ https://issues.apache.org/jira/browse/MESOS-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144119#comment-15144119 ] haosdent commented on MESOS-4168: - {code} [ OK ] MasterMaintenanceTest.EnterMaintenanceMode (138 ms) {code} > MasterMaintenanceTest.EnterMaintenanceMode is slow > --- > > Key: MESOS-4168 > URL: https://issues.apache.org/jira/browse/MESOS-4168 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{MasterMaintenanceTest.EnterMaintenanceMode}} test takes more than > {{5s}} to finish on my Mac OS 10.10.4: > {code} > MasterMaintenanceTest.EnterMaintenanceMode (5087 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4169) MasterMaintenanceTest.InverseOffers is slow
[ https://issues.apache.org/jira/browse/MESOS-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144120#comment-15144120 ] haosdent commented on MESOS-4169: - {code} [ OK ] MasterMaintenanceTest.InverseOffers (134 ms) {code} > MasterMaintenanceTest.InverseOffers is slow > --- > > Key: MESOS-4169 > URL: https://issues.apache.org/jira/browse/MESOS-4169 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{MasterMaintenanceTest.InverseOffers}} test takes more than {{2s}} to > finish on my Mac OS 10.10.4: > {code} > MasterMaintenanceTest.InverseOffers (2027 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4172) GarbageCollectorIntegrationTest.Restart is slow
[ https://issues.apache.org/jira/browse/MESOS-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144118#comment-15144118 ] haosdent commented on MESOS-4172: - After {code} [ OK ] GarbageCollectorIntegrationTest.Restart (158 ms) {code} > GarbageCollectorIntegrationTest.Restart is slow > --- > > Key: MESOS-4172 > URL: https://issues.apache.org/jira/browse/MESOS-4172 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{GarbageCollectorIntegrationTest.Restart}} test takes more than {{5s}} > to finish on my Mac OS 10.10.4: > {code} > GarbageCollectorIntegrationTest.Restart (5102 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4167) MasterTest.OfferTimeout is slow
[ https://issues.apache.org/jira/browse/MESOS-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144125#comment-15144125 ] haosdent commented on MESOS-4167: - After {code} [ OK ] MasterTest.OfferTimeout (62 ms) {code} > MasterTest.OfferTimeout is slow > --- > > Key: MESOS-4167 > URL: https://issues.apache.org/jira/browse/MESOS-4167 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{MasterTest.OfferTimeout}} test takes more than {{1s}} to finish on my > Mac OS 10.10.4: > {code} > MasterTest.OfferTimeout (1053 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4165) MasterTest.MasterInfoOnReElection is slow
[ https://issues.apache.org/jira/browse/MESOS-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144122#comment-15144122 ] haosdent commented on MESOS-4165: - After {code} [ OK ] MasterTest.MasterInfoOnReElection (62 ms) {code} > MasterTest.MasterInfoOnReElection is slow > - > > Key: MESOS-4165 > URL: https://issues.apache.org/jira/browse/MESOS-4165 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{MasterTest.MasterInfoOnReElection}} test takes more than {{1s}} to > finish on my Mac OS 10.10.4: > {code} > MasterTest.MasterInfoOnReElection (1024 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4171) OversubscriptionTest.RemoveCapabilitiesOnSchedulerFailover is slow
[ https://issues.apache.org/jira/browse/MESOS-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144127#comment-15144127 ] haosdent commented on MESOS-4171: - After {code} [ OK ] OversubscriptionTest.RemoveCapabilitiesOnSchedulerFailover (56 ms) {code} > OversubscriptionTest.RemoveCapabilitiesOnSchedulerFailover is slow > -- > > Key: MESOS-4171 > URL: https://issues.apache.org/jira/browse/MESOS-4171 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{OversubscriptionTest.RemoveCapabilitiesOnSchedulerFailover}} test takes > more than {{1s}} to finish on my Mac OS 10.10.4: > {code} > OversubscriptionTest.RemoveCapabilitiesOnSchedulerFailover (1018 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4170) OversubscriptionTest.UpdateAllocatorOnSchedulerFailover is slow
[ https://issues.apache.org/jira/browse/MESOS-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144126#comment-15144126 ] haosdent commented on MESOS-4170: - After {code} [ OK ] OversubscriptionTest.UpdateAllocatorOnSchedulerFailover (56 ms) {code} > OversubscriptionTest.UpdateAllocatorOnSchedulerFailover is slow > --- > > Key: MESOS-4170 > URL: https://issues.apache.org/jira/browse/MESOS-4170 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{OversubscriptionTest.UpdateAllocatorOnSchedulerFailover}} test takes > more than {{1s}} to finish on my Mac OS 10.10.4: > {code} > OversubscriptionTest.UpdateAllocatorOnSchedulerFailover (1018 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4166) MasterTest.LaunchCombinedOfferTest is slow
[ https://issues.apache.org/jira/browse/MESOS-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144124#comment-15144124 ] haosdent commented on MESOS-4166: - After {code} [ OK ] MasterTest.LaunchCombinedOfferTest (101 ms) {code} > MasterTest.LaunchCombinedOfferTest is slow > -- > > Key: MESOS-4166 > URL: https://issues.apache.org/jira/browse/MESOS-4166 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{MasterTest.LaunchCombinedOfferTest}} test takes more than {{2s}} to > finish on my Mac OS 10.10.4: > {code} > MasterTest.LaunchCombinedOfferTest (2023 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4160) Log recover tests are slow
[ https://issues.apache.org/jira/browse/MESOS-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144164#comment-15144164 ] haosdent commented on MESOS-4160: - Hi [~lins05] Are you still doing this? > Log recover tests are slow > -- > > Key: MESOS-4160 > URL: https://issues.apache.org/jira/browse/MESOS-4160 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: Shuai Lin >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > On Mac OS 10.10.4, some tests take longer than {{1s}} to finish: > {code} > RecoverTest.AutoInitialization (1003 ms) > RecoverTest.AutoInitializationRetry (1000 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container
[ https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144175#comment-15144175 ] haosdent commented on MESOS-3738: - Hi, [~meatmanek] I think you could patch it as I mentioned above. > Mesos health check is invoked incorrectly when Mesos slave is within the > docker container > - > > Key: MESOS-3738 > URL: https://issues.apache.org/jira/browse/MESOS-3738 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.25.0 > Environment: Docker 1.8.0: > Client: > Version: 1.8.0 > API version: 1.20 > Go version: go1.4.2 > Git commit: 0d03096 > Built:Tue Aug 11 16:48:39 UTC 2015 > OS/Arch: linux/amd64 > Server: > Version: 1.8.0 > API version: 1.20 > Go version: go1.4.2 > Git commit: 0d03096 > Built:Tue Aug 11 16:48:39 UTC 2015 > OS/Arch: linux/amd64 > Host: Ubuntu 14.04 > Container: Debian 8.1 + Java-7 >Reporter: Yong Tang >Assignee: haosdent > Fix For: 0.26.0 > > Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, > MESOS-3738-0_25_0.patch > > > When Mesos slave is within the container, the COMMAND health check from > Marathon is invoked incorrectly. > In such a scenario, the sandbox directory (instead of the > launcher/health-check directory) is used. This result in an error with the > container. > Command to invoke the Mesos slave container: > {noformat} > sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v > /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro > -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos > mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos > --executor_registration_timeout=5mins --docker_stop_timeout=10secs > --launcher=posix > {noformat} > Marathon JSON file: > {code} > { > "id": "ubuntu", > "container": > { > "type": "DOCKER", > "docker": > { > "image": "ubuntu", > "network": "BRIDGE", > "parameters": [] > } > }, > "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ], > "uris": [], > "healthChecks": > [ > { > "protocol": "COMMAND", > "command": { "value": "echo Success" }, > "gracePeriodSeconds": 3000, > "intervalSeconds": 5, > "timeoutSeconds": 5, > "maxConsecutiveFailures": 300 > } > ], > "instances": 1 > } > {code} > {noformat} > STDOUT: > root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout > --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" > --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" > --mapped_directory="/mnt/mesos/sandbox" --quiet="false" > --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --stop_timeout="10secs" > --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" > --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" > --mapped_directory="/mnt/mesos/sandbox" --quiet="false" > --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --stop_timeout="10secs" > Registered docker executor on b01e2e75afcb > Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106 > 1 > Launching health check process: > /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check > --executor=(1)@10.2.1.7:40695 > --health_check_json={"command":{"shell":true,"value":"docker exec > mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f > sh -c \" echo Success > \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0} > --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106 > Health check process launched at pid: 94 > 1 > 1 > 1 > 1 > 1 > STDERR: > root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr > I1014 23:15:58.12795056 exec.cpp:134] Version: 0.25.0 > I1014 23:15:58.13062762 exec.cpp:208] Executor registered on slave >
[jira] [Assigned] (MESOS-4162) SlaveTest.MetricsSlaveLaunchErrors is slow
[ https://issues.apache.org/jira/browse/MESOS-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-4162: --- Assignee: haosdent > SlaveTest.MetricsSlaveLaunchErrors is slow > -- > > Key: MESOS-4162 > URL: https://issues.apache.org/jira/browse/MESOS-4162 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{SlaveTest.MetricsSlaveLaunchErrors}} test takes around {{1s}} to finish > on my Mac OS 10.10.4: > {code} > SlaveTest.MetricsSlaveLaunchErrors (1009 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4656) strings::split behaves incorrectly when n=1
Benjamin Mahler created MESOS-4656: -- Summary: strings::split behaves incorrectly when n=1 Key: MESOS-4656 URL: https://issues.apache.org/jira/browse/MESOS-4656 Project: Mesos Issue Type: Bug Components: stout Reporter: Benjamin Mahler Assignee: Benjamin Mahler While looking at the patches for MESOS-3833, I noticed that the code for strings::split behaves incorrectly for n=1 (maximum number of tokens). Adding the following test case demonstrates the issue: {code} TEST(StringsTest, SplitNOne) { vector tokens = strings::split("foo,bar,,,", ",", 1); ASSERT_EQ(1u, tokens.size()); EXPECT_EQ("foo,bar,,,", tokens[0]); } {code} This fails as follows: {noformat} [ RUN ] StringsTest.SplitNOne ../../../../3rdparty/libprocess/3rdparty/stout/tests/strings_tests.cpp:357: Failure Value of: tokens.size() Actual: 5 Expected: 1u Which is: 1 [ FAILED ] StringsTest.SplitNOne (0 ms) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.
[ https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143334#comment-15143334 ] Cong Wang commented on MESOS-4646: -- OK, the kernel stuck is probably another kernel bug, but without kernel stack trace, I have no idea what bug it is. Could you please try to setup kdump to capture the kernel crash/stuck? BTW, here at Twitter we use 4.1 kernel + the above fix, I just repeated the PortMappingIsolatorTest for 30 times, all passed. So maybe it is a new kernel bug I never see before. > PortMappingIsolatorTests get kernel stuck. > -- > > Key: MESOS-4646 > URL: https://issues.apache.org/jira/browse/MESOS-4646 > Project: Mesos > Issue Type: Bug > Environment: Linux Kernel 3.19.9-49-generic, > libnl-3.2.27 >Reporter: Till Toenshoff >Assignee: Cong Wang > > {noformat} > $ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*" > Source directory: /home/till/scratchpad/mesos > Build directory: /home/till/scratchpad/mesos/build > - > We cannot run any cgroups tests that require mounting > hierarchies because you have the following hierarchies mounted: > /sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, > /sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, > /sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, > /sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd > We'll disable the CgroupsNoHierarchyTest test fixture for now. > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > No 'perf' command found so no 'perf' tests will be run > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > The 'perf' command wasn't found so tests using it > to sample the 'cycles' hardware event will not be run. > - > /bin/nc > /usr/local/bin/curl > Note: Google Test filter = >
[jira] [Created] (MESOS-4659) Consider how to handle orphaned tasks after master failover
Neil Conway created MESOS-4659: -- Summary: Consider how to handle orphaned tasks after master failover Key: MESOS-4659 URL: https://issues.apache.org/jira/browse/MESOS-4659 Project: Mesos Issue Type: Bug Components: master Reporter: Neil Conway If a framework becomes disconnected from the master, its tasks are killed after waiting for {{failover_timeout}}. However, if a master failover occurs but a framework never reconnects to the new master, we never kill any of the tasks associated with that framework. These tasks remain orphaned and presumably would need to be manually removed by the operator. We should consider whether to kill such orphaned tasks automatically, likely after waiting for some (framework-configurable?) timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4660) Document net_cls isolator in docs/mesos-containerizer.md.
[ https://issues.apache.org/jira/browse/MESOS-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4660: -- Sprint: Mesosphere Sprint 28 Story Points: 1 > Document net_cls isolator in docs/mesos-containerizer.md. > - > > Key: MESOS-4660 > URL: https://issues.apache.org/jira/browse/MESOS-4660 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu > > We need to add a section in the doc to describe how to use cgroups/net_cls > isolator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4660) Document net_cls isolator in docs/mesos-containerizer.md.
Jie Yu created MESOS-4660: - Summary: Document net_cls isolator in docs/mesos-containerizer.md. Key: MESOS-4660 URL: https://issues.apache.org/jira/browse/MESOS-4660 Project: Mesos Issue Type: Task Reporter: Jie Yu We need to add a section in the doc to describe how to use cgroups/net_cls isolator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4633) Tests will dereference stack allocated agent objects upon assertion/expectation failure.
[ https://issues.apache.org/jira/browse/MESOS-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141913#comment-15141913 ] Joseph Wu edited comment on MESOS-4633 at 2/11/16 7:33 PM: --- || Review || Summary || | https://reviews.apache.org/r/43434/ | Change to {{Option}} | || Discarded below | (decided to take a different approach) | | https://reviews.apache.org/r/43435/ | Change to {{StartSlave}} helper | | https://reviews.apache.org/r/43436/ | Change to {{TestContainerizer}} | | https://reviews.apache.org/r/43437/ https://reviews.apache.org/r/43438/ https://reviews.apache.org/r/43439/ https://reviews.apache.org/r/43440/ https://reviews.apache.org/r/43441/ https://reviews.apache.org/r/43442/ https://reviews.apache.org/r/43444/ https://reviews.apache.org/r/43445/ https://reviews.apache.org/r/43446/ https://reviews.apache.org/r/43447/ https://reviews.apache.org/r/43448/ | Tons and tons of test changes | was (Author: kaysoky): || Review || Summary || | https://reviews.apache.org/r/43434/ | Change to {{Option}} | | https://reviews.apache.org/r/43435/ | Change to {{StartSlave}} helper | | https://reviews.apache.org/r/43436/ | Change to {{TestContainerizer}} | | https://reviews.apache.org/r/43437/ https://reviews.apache.org/r/43438/ https://reviews.apache.org/r/43439/ https://reviews.apache.org/r/43440/ https://reviews.apache.org/r/43441/ https://reviews.apache.org/r/43442/ https://reviews.apache.org/r/43444/ https://reviews.apache.org/r/43445/ https://reviews.apache.org/r/43446/ https://reviews.apache.org/r/43447/ https://reviews.apache.org/r/43448/ | Tons and tons of test changes | > Tests will dereference stack allocated agent objects upon > assertion/expectation failure. > > > Key: MESOS-4633 > URL: https://issues.apache.org/jira/browse/MESOS-4633 > Project: Mesos > Issue Type: Bug >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: flaky, mesosphere, tech-debt, test > > Tests that use the {{StartSlave}} test helper are generally fragile when the > test fails an assert/expect in the middle of the test. This is because the > {{StartSlave}} helper takes raw pointer arguments, which may be > stack-allocated. > In case of an assert failure, the test immediately exits (destroying stack > allocated objects) and proceeds onto test cleanup. The test cleanup may > dereference some of these destroyed objects, leading to a test crash like: > {code} > [18:27:36][Step 8/8] F0204 18:27:35.981302 23085 logging.cpp:64] RAW: Pure > virtual method called > [18:27:36][Step 8/8] @ 0x7f7077055e1c google::LogMessage::Fail() > [18:27:36][Step 8/8] @ 0x7f707705ba6f google::RawLog__() > [18:27:36][Step 8/8] @ 0x7f70760f76c9 __cxa_pure_virtual > [18:27:36][Step 8/8] @ 0xa9423c > mesos::internal::tests::Cluster::Slaves::shutdown() > [18:27:36][Step 8/8] @ 0x1074e45 > mesos::internal::tests::MesosTest::ShutdownSlaves() > [18:27:36][Step 8/8] @ 0x1074de4 > mesos::internal::tests::MesosTest::Shutdown() > [18:27:36][Step 8/8] @ 0x1070ec7 > mesos::internal::tests::MesosTest::TearDown() > {code} > The {{StartSlave}} helper should take {{shared_ptr}} arguments instead. > This also means that we can remove the {{Shutdown}} helper from most of these > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4658) process::Connection can lead to deadlock around execution in the same context.
Anand Mazumdar created MESOS-4658: - Summary: process::Connection can lead to deadlock around execution in the same context. Key: MESOS-4658 URL: https://issues.apache.org/jira/browse/MESOS-4658 Project: Mesos Issue Type: Bug Components: HTTP API, libprocess Reporter: Anand Mazumdar The {{Connection}} abstraction is prone to deadlocks arising from the object being destroyed inside the same execution context. Consider this example: {code} Option connection = process::http::connect(...); connection.disconnected() .onAny(defer(self(), , connection)); connection.disconnect(); connection = None(); {code} In the above snippet, if the {{connection = None()}} gets executed first before the actual dispatch to {{ConnectionProcess}} happens. You might loose the only existing reference to {{Connection}} object inside {{ConnectionProcess::disconnect}}. This would lead to the destruction of the {{Connection}} object in the {{ConnectionProcess}} execution context. We do have a snippet in our existing code that alludes to such occurrences happening: https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/http.cpp#L1325 {code} // This is a one time request which will close the connection when // the response is received. Since 'Connection' is reference-counted, // we must keep a copy around until the disconnection occurs. Note // that in order to avoid a deadlock (Connection destruction occurring // from the ConnectionProcess execution context), we use 'async'. {code} AFAICT, for scenarios where we need to hold on to the {{Connection}} object for later, this approach does not suffice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4654) Add a test env to keep temporary folder
haosdent created MESOS-4654: --- Summary: Add a test env to keep temporary folder Key: MESOS-4654 URL: https://issues.apache.org/jira/browse/MESOS-4654 Project: Mesos Issue Type: Improvement Components: technical debt, test Reporter: haosdent Assignee: haosdent Priority: Minor Currently, we would clear up temporary folders after we tear down test cases. But sometimes we want to check the containers stdout/stderr logs which located in the temporary folder to see what happens. And may want to check resource files in the temporary file to find out why the test cases failed. I think it is more convenient if we could keep the temporary folder after a environment variable set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4655) PerfEventIsolatorTest.ROOT_CGROUPS_Sample failed in CentOS 7.1
haosdent created MESOS-4655: --- Summary: PerfEventIsolatorTest.ROOT_CGROUPS_Sample failed in CentOS 7.1 Key: MESOS-4655 URL: https://issues.apache.org/jira/browse/MESOS-4655 Project: Mesos Issue Type: Bug Components: test Reporter: haosdent Assignee: haosdent -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2444) Update mesos presentations documentation
[ https://issues.apache.org/jira/browse/MESOS-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142572#comment-15142572 ] Michael Park commented on MESOS-2444: - {noformat} commit da240e2ca55ab3ebd3ca51934bf29a81cb5169ab Author: Guangya LiuDate: Thu Feb 11 00:42:32 2016 -0800 Updated `docs/presentations.md` to include MesosCon 2015 slides. Review: https://reviews.apache.org/r/43403/ {noformat} > Update mesos presentations documentation > > > Key: MESOS-2444 > URL: https://issues.apache.org/jira/browse/MESOS-2444 > Project: Mesos > Issue Type: Task > Components: documentation, project website >Reporter: Dave Lester >Assignee: Disha Singh > Labels: newbie > > The list of Mesos presentations in `docs/mesos-presentations.md` only > reflects presentations as of mid-2014 and could be more-comprehensive. It > would be great to include additional presentations (both slides and videos) > on this page. > Optionally, the display of content on this page could be improved -- > potentially using a table and generating thumbnails for each video/slideshow > to make it more visual. If this route is taken, images can be added to > docs/images; ideally within a subfolder to organize them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4655) PerfEventIsolatorTest.ROOT_CGROUPS_Sample failed in CentOS 7.1
[ https://issues.apache.org/jira/browse/MESOS-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142538#comment-15142538 ] haosdent commented on MESOS-4655: - Patch: https://reviews.apache.org/r/43283/ https://reviews.apache.org/r/43284/ > PerfEventIsolatorTest.ROOT_CGROUPS_Sample failed in CentOS 7.1 > -- > > Key: MESOS-4655 > URL: https://issues.apache.org/jira/browse/MESOS-4655 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: haosdent >Assignee: haosdent > > PerfEventIsolatorTest.ROOT_CGROUPS_Sample failed in CentOS 7.1, error log is: > {code} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from PerfEventIsolatorTest > [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample > I0207 00:58:32.392724 16501 perf_event.cpp:71] Creating PerfEvent isolator > I0207 00:58:32.440187 16501 perf_event.cpp:109] PerfEvent isolator will > profile for 250ms every 500ms for events: { cycles, task-clock } > I0207 00:58:32.443006 16521 perf_event.cpp:217] Preparing perf event cgroup > for 239d30bb-f7a1-413b-9d99-0914149d5899 > E0207 00:58:33.224544 16518 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line ' counted>,,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected > number of fields > E0207 00:58:33.727793 16516 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line ' counted>,,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected > number of fields > E0207 00:58:34.230981 16517 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line ' counted>,,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected > number of fields > E0207 00:58:34.734318 16520 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line ' counted>,,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected > number of fields > E0207 00:58:35.237889 16517 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line ' counted>,,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected > number of fields > E0207 00:58:35.742452 16522 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line ' counted>,,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected > number of fields > E0207 00:58:36.246068 16515 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line ' counted>,,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected > number of fields > ../../src/tests/containerizer/isolator_tests.cpp:1083: Failure > Expected: (statistics1.get().perf().timestamp()) != > (statistics2.perf().timestamp()), actual: 1.45478e+09 vs 1.45478e+09 > ../../src/tests/containerizer/isolator_tests.cpp:1085: Failure > Value of: statistics2.perf().has_cycles() > Actual: false > Expected: true > ../../src/tests/containerizer/isolator_tests.cpp:1088: Failure > Value of: statistics2.perf().has_task_clock() > Actual: false > Expected: true > [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample (4069 ms) > [--] 1 test from PerfEventIsolatorTest (4069 ms total) > [--] Global test environment tear-down > ../../src/tests/environment.cpp:732: Failure > Failed > Tests completed with child processes remaining: > -+- 16501 /home/haosdent/mesos/build/src/.libs/lt-mesos-tests > --gtest_filter=PerfEventIsolatorTest.ROOT_CGROUPS_Sample --verbose > |-+- 16580 /home/haosdent/mesos/build/src/.libs/lt-mesos-tests > --gtest_filter=PerfEventIsolatorTest.ROOT_CGROUPS_Sample --verbose > | \-+- 16582 perf stat --all-cpus --field-separator , --log-fd 1 --event > cycles --cgroup mesos/239d30bb-f7a1-413b-9d99-0914149d5899 --event task-clock > --cgroup mesos/239d30bb-f7a1-413b-9d99-0914149d5899 -- sleep 0.25 > | \--- 16584 sleep 0.25 > \--- 16581 () > [==] 1 test from 1 test case ran. (4095 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4039) PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails
[ https://issues.apache.org/jira/browse/MESOS-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142537#comment-15142537 ] haosdent commented on MESOS-4039: - Sorry, it's my bad to don't realize this ticket have already closed. > PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails > --- > > Key: MESOS-4039 > URL: https://issues.apache.org/jira/browse/MESOS-4039 > Project: Mesos > Issue Type: Bug >Reporter: Greg Mann >Assignee: Jan Schlicht > Labels: mesosphere, test-fail > > PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails on CentOS 6.6: > {code} > [--] 1 test from PerfEventIsolatorTest > [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample > ../../src/tests/containerizer/isolator_tests.cpp:848: Failure > isolator: Perf is not supported > [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample (79 ms) > [--] 1 test from PerfEventIsolatorTest (79 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (86 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1356) Uncaught exceptions
[ https://issues.apache.org/jira/browse/MESOS-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142591#comment-15142591 ] Klaus Ma commented on MESOS-1356: - Yes, maybe more or less the issues than description. > Uncaught exceptions > --- > > Key: MESOS-1356 > URL: https://issues.apache.org/jira/browse/MESOS-1356 > Project: Mesos > Issue Type: Bug >Reporter: Niklas Quarfot Nielsen >Assignee: Michael Browning > Labels: coverity, newbie > > We usually do _not_ use exceptions in Mesos, but some libraries may and we > should handle them and perhaps convert them into Try<>/Error. > > *** CID 1213893: Uncaught exception (UNCAUGHT_EXCEPT) > /src/slave/containerizer/linux_launcher.cpp: 148 in > mesos::internal::slave::_childMain(const std::tr1::function &, int > *)() > 142 return (*func)(); > 143 } > 144 > 145 > 146 // Helper that creates a new session then blocks on reading the pipe > before > 147 // calling the supplied function. > >>> CID 1213893: Uncaught exception (UNCAUGHT_EXCEPT) > >>> In function "_childMain" an exception of type > >>> "std::tr1::bad_function_call" is thrown and never caught. > 148 static int _childMain( > 149 const lambda::function& childFunction, > 150 int pipes[2]) > 151 { > 152 // In child. > 153 os::close(pipes[1]); > > *** CID 1213894: Uncaught exception (UNCAUGHT_EXCEPT) > /src/slave/containerizer/linux_launcher.cpp: 137 in > mesos::internal::slave::childMain(void *)() > 131 > 132 return Nothing(); > 133 } > 134 > 135 > 136 // Helper for clone() which expects an int(void*). > >>> CID 1213894: Uncaught exception (UNCAUGHT_EXCEPT) > >>> In function "childMain" an exception of type > >>> "std::tr1::bad_function_call" is thrown and never caught. > 137 static int childMain(void* child) > 138 { > 139 const lambda::function * func = > 140 static_cast*> (child); > 141 > 142 return (*func)(); > > *** CID 1213895: Uncaught exception (UNCAUGHT_EXCEPT) > /src/usage/main.cpp: 72 in main() > 66<< endl > 67<< "Supported options:" << endl > 68<< flags.usage(); > 69 } > 70 > 71 > >>> CID 1213895: Uncaught exception (UNCAUGHT_EXCEPT) > >>> In function "main" an exception of type > >>> "google::protobuf::FatalException" is thrown and never caught. > 72 int main(int argc, char** argv) > 73 { > 74 GOOGLE_PROTOBUF_VERIFY_VERSION; > 75 > 76 Flags flags; > 77 > /src/usage/main.cpp: 72 in main() > 66<< endl > 67<< "Supported options:" << endl > 68<< flags.usage(); > 69 } > 70 > 71 > >>> CID 1213895: Uncaught exception (UNCAUGHT_EXCEPT) > >>> In function "main" an exception of type > >>> "google::protobuf::FatalException" is thrown and never caught. > 72 int main(int argc, char** argv) > 73 { > 74 GOOGLE_PROTOBUF_VERIFY_VERSION; > 75 > 76 Flags flags; > 77 > /src/usage/main.cpp: 72 in main() > 66<< endl > 67<< "Supported options:" << endl > 68<< flags.usage(); > 69 } > 70 > 71 > >>> CID 1213895: Uncaught exception (UNCAUGHT_EXCEPT) > >>> In function "main" an exception of type > >>> "google::protobuf::FatalException" is thrown and never caught. > 72 int main(int argc, char** argv) > 73 { > 74 GOOGLE_PROTOBUF_VERIFY_VERSION; > 75 > 76 Flags flags; > 77 > > *** CID 1213896: Uncaught exception (UNCAUGHT_EXCEPT) > /src/launcher/executor.cpp: 423 in main() > 417 }; > 418 > 419 } // namespace internal { > 420 } // namespace mesos { > 421 > 422 > >>> CID 1213896: Uncaught exception (UNCAUGHT_EXCEPT) > >>> In function "main" an exception of type "std::tr1::bad_function_call" > >>> is thrown and never caught. > 423 int main(int argc, char** argv) > 424 { > 425 mesos::internal::CommandExecutor executor; > 426 mesos::MesosExecutorDriver driver(); > 427 return driver.run() == mesos::DRIVER_STOPPED ? 0 : 1; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4647) Use in_memory as default registry when testing
[ https://issues.apache.org/jira/browse/MESOS-4647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142676#comment-15142676 ] haosdent commented on MESOS-4647: - The draft: https://reviews.apache.org/r/43480/ So far only change: # MesosZooKeeperTest # MasterTest.RecoveredSlaveDoesNotReregister # MasterTest.RateLimitRecoveredSlaveRemoval # MasterTest.CancelRecoveredSlaveRemoval > Use in_memory as default registry when testing > -- > > Key: MESOS-4647 > URL: https://issues.apache.org/jira/browse/MESOS-4647 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent >Assignee: haosdent > > Currently, we use {{replicated_log}} as default registry when testing. This > cause io operations when testings and slow down test cases. We should change > it to use {{in_memory}} when testing and only use {{replicated_log}} when > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4039) PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails
[ https://issues.apache.org/jira/browse/MESOS-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142525#comment-15142525 ] haosdent commented on MESOS-4039: - Got it. > PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails > --- > > Key: MESOS-4039 > URL: https://issues.apache.org/jira/browse/MESOS-4039 > Project: Mesos > Issue Type: Bug >Reporter: Greg Mann >Assignee: Jan Schlicht > Labels: mesosphere, test-fail > > PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails on CentOS 6.6: > {code} > [--] 1 test from PerfEventIsolatorTest > [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample > ../../src/tests/containerizer/isolator_tests.cpp:848: Failure > isolator: Perf is not supported > [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample (79 ms) > [--] 1 test from PerfEventIsolatorTest (79 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (86 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4655) PerfEventIsolatorTest.ROOT_CGROUPS_Sample failed in CentOS 7.1
[ https://issues.apache.org/jira/browse/MESOS-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-4655: Description: {code} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from PerfEventIsolatorTest [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample I0207 00:58:32.392724 16501 perf_event.cpp:71] Creating PerfEvent isolator I0207 00:58:32.440187 16501 perf_event.cpp:109] PerfEvent isolator will profile for 250ms every 500ms for events: { cycles, task-clock } I0207 00:58:32.443006 16521 perf_event.cpp:217] Preparing perf event cgroup for 239d30bb-f7a1-413b-9d99-0914149d5899 E0207 00:58:33.224544 16518 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number of fields E0207 00:58:33.727793 16516 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number of fields E0207 00:58:34.230981 16517 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number of fields E0207 00:58:34.734318 16520 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number of fields E0207 00:58:35.237889 16517 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number of fields E0207 00:58:35.742452 16522 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number of fields E0207 00:58:36.246068 16515 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number of fields ../../src/tests/containerizer/isolator_tests.cpp:1083: Failure Expected: (statistics1.get().perf().timestamp()) != (statistics2.perf().timestamp()), actual: 1.45478e+09 vs 1.45478e+09 ../../src/tests/containerizer/isolator_tests.cpp:1085: Failure Value of: statistics2.perf().has_cycles() Actual: false Expected: true ../../src/tests/containerizer/isolator_tests.cpp:1088: Failure Value of: statistics2.perf().has_task_clock() Actual: false Expected: true [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample (4069 ms) [--] 1 test from PerfEventIsolatorTest (4069 ms total) [--] Global test environment tear-down ../../src/tests/environment.cpp:732: Failure Failed Tests completed with child processes remaining: -+- 16501 /home/haosdent/mesos/build/src/.libs/lt-mesos-tests --gtest_filter=PerfEventIsolatorTest.ROOT_CGROUPS_Sample --verbose |-+- 16580 /home/haosdent/mesos/build/src/.libs/lt-mesos-tests --gtest_filter=PerfEventIsolatorTest.ROOT_CGROUPS_Sample --verbose | \-+- 16582 perf stat --all-cpus --field-separator , --log-fd 1 --event cycles --cgroup mesos/239d30bb-f7a1-413b-9d99-0914149d5899 --event task-clock --cgroup mesos/239d30bb-f7a1-413b-9d99-0914149d5899 -- sleep 0.25 | \--- 16584 sleep 0.25 \--- 16581 () [==] 1 test from 1 test case ran. (4095 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample {code} > PerfEventIsolatorTest.ROOT_CGROUPS_Sample failed in CentOS 7.1 > -- > > Key: MESOS-4655 > URL: https://issues.apache.org/jira/browse/MESOS-4655 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: haosdent >Assignee: haosdent > > {code} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from PerfEventIsolatorTest > [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample > I0207 00:58:32.392724 16501 perf_event.cpp:71] Creating PerfEvent isolator > I0207 00:58:32.440187 16501 perf_event.cpp:109] PerfEvent isolator will > profile for 250ms every 500ms for events: { cycles, task-clock } > I0207 00:58:32.443006 16521 perf_event.cpp:217] Preparing perf event cgroup > for 239d30bb-f7a1-413b-9d99-0914149d5899 > E0207 00:58:33.224544 16518 perf_event.cpp:408] Failed to get perf sample: > Failed to parse perf sample: Failed to parse perf sample line ' counted>,,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected > number of fields > E0207 00:58:33.727793 16516 perf_event.cpp:408] Failed to get perf sample: > Failed to
[jira] [Updated] (MESOS-4655) PerfEventIsolatorTest.ROOT_CGROUPS_Sample failed in CentOS 7.1
[ https://issues.apache.org/jira/browse/MESOS-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-4655: Description: PerfEventIsolatorTest.ROOT_CGROUPS_Sample failed in CentOS 7.1, error log is: {code} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from PerfEventIsolatorTest [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample I0207 00:58:32.392724 16501 perf_event.cpp:71] Creating PerfEvent isolator I0207 00:58:32.440187 16501 perf_event.cpp:109] PerfEvent isolator will profile for 250ms every 500ms for events: { cycles, task-clock } I0207 00:58:32.443006 16521 perf_event.cpp:217] Preparing perf event cgroup for 239d30bb-f7a1-413b-9d99-0914149d5899 E0207 00:58:33.224544 16518 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number of fields E0207 00:58:33.727793 16516 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number of fields E0207 00:58:34.230981 16517 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number of fields E0207 00:58:34.734318 16520 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number of fields E0207 00:58:35.237889 16517 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number of fields E0207 00:58:35.742452 16522 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number of fields E0207 00:58:36.246068 16515 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number of fields ../../src/tests/containerizer/isolator_tests.cpp:1083: Failure Expected: (statistics1.get().perf().timestamp()) != (statistics2.perf().timestamp()), actual: 1.45478e+09 vs 1.45478e+09 ../../src/tests/containerizer/isolator_tests.cpp:1085: Failure Value of: statistics2.perf().has_cycles() Actual: false Expected: true ../../src/tests/containerizer/isolator_tests.cpp:1088: Failure Value of: statistics2.perf().has_task_clock() Actual: false Expected: true [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample (4069 ms) [--] 1 test from PerfEventIsolatorTest (4069 ms total) [--] Global test environment tear-down ../../src/tests/environment.cpp:732: Failure Failed Tests completed with child processes remaining: -+- 16501 /home/haosdent/mesos/build/src/.libs/lt-mesos-tests --gtest_filter=PerfEventIsolatorTest.ROOT_CGROUPS_Sample --verbose |-+- 16580 /home/haosdent/mesos/build/src/.libs/lt-mesos-tests --gtest_filter=PerfEventIsolatorTest.ROOT_CGROUPS_Sample --verbose | \-+- 16582 perf stat --all-cpus --field-separator , --log-fd 1 --event cycles --cgroup mesos/239d30bb-f7a1-413b-9d99-0914149d5899 --event task-clock --cgroup mesos/239d30bb-f7a1-413b-9d99-0914149d5899 -- sleep 0.25 | \--- 16584 sleep 0.25 \--- 16581 () [==] 1 test from 1 test case ran. (4095 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample {code} was: {code} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from PerfEventIsolatorTest [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample I0207 00:58:32.392724 16501 perf_event.cpp:71] Creating PerfEvent isolator I0207 00:58:32.440187 16501 perf_event.cpp:109] PerfEvent isolator will profile for 250ms every 500ms for events: { cycles, task-clock } I0207 00:58:32.443006 16521 perf_event.cpp:217] Preparing perf event cgroup for 239d30bb-f7a1-413b-9d99-0914149d5899 E0207 00:58:33.224544 16518 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number of fields E0207 00:58:33.727793 16516 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number of fields E0207 00:58:34.230981 16517 perf_event.cpp:408] Failed to get perf sample: Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number
[jira] [Updated] (MESOS-4653) Unify test case temporary folder name format
[ https://issues.apache.org/jira/browse/MESOS-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-4653: Component/s: test > Unify test case temporary folder name format > > > Key: MESOS-4653 > URL: https://issues.apache.org/jira/browse/MESOS-4653 > Project: Mesos > Issue Type: Improvement > Components: test >Reporter: haosdent >Assignee: haosdent >Priority: Minor > Labels: test > > In > [environment.cpp#L759https://github.com/apache/mesos/blob/master/src/tests/environment.cpp#L759] > {code} > const string& path = > path::join("/tmp", strings::join("_", testCase, testName, "XX")); > {code} > The temporary file format here is {{testCase_testName_xx}} here. > But in > [utils.hpp#L37|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/tests/utils.hpp#L37] > {code} > // Create a temporary directory for the test. > Try directory = os::mkdtemp(); > {code} > The temporary folder we create here is {{xx}}. I think it would be better > we could unify this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4653) Unify test case temporary folder name format
[ https://issues.apache.org/jira/browse/MESOS-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-4653: Labels: test (was: ) > Unify test case temporary folder name format > > > Key: MESOS-4653 > URL: https://issues.apache.org/jira/browse/MESOS-4653 > Project: Mesos > Issue Type: Improvement > Components: test >Reporter: haosdent >Assignee: haosdent >Priority: Minor > Labels: test > > In > [environment.cpp#L759https://github.com/apache/mesos/blob/master/src/tests/environment.cpp#L759] > {code} > const string& path = > path::join("/tmp", strings::join("_", testCase, testName, "XX")); > {code} > The temporary file format here is {{testCase_testName_xx}} here. > But in > [utils.hpp#L37|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/tests/utils.hpp#L37] > {code} > // Create a temporary directory for the test. > Try directory = os::mkdtemp(); > {code} > The temporary folder we create here is {{xx}}. I think it would be better > we could unify this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4039) PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails
[ https://issues.apache.org/jira/browse/MESOS-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142513#comment-15142513 ] Jan Schlicht commented on MESOS-4039: - Looks like the test fails with a different reason than the one in this bug. Can you create a separate JIRA issue for your case to better track the differences? I'll close this issue here again, because the {{Perf is not supported}} failure has been fixed. > PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails > --- > > Key: MESOS-4039 > URL: https://issues.apache.org/jira/browse/MESOS-4039 > Project: Mesos > Issue Type: Bug >Reporter: Greg Mann >Assignee: Jan Schlicht > Labels: mesosphere, test-fail > > PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails on CentOS 6.6: > {code} > [--] 1 test from PerfEventIsolatorTest > [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample > ../../src/tests/containerizer/isolator_tests.cpp:848: Failure > isolator: Perf is not supported > [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample (79 ms) > [--] 1 test from PerfEventIsolatorTest (79 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (86 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4039) PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails
[ https://issues.apache.org/jira/browse/MESOS-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142532#comment-15142532 ] Jan Schlicht commented on MESOS-4039: - Thanks! > PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails > --- > > Key: MESOS-4039 > URL: https://issues.apache.org/jira/browse/MESOS-4039 > Project: Mesos > Issue Type: Bug >Reporter: Greg Mann >Assignee: Jan Schlicht > Labels: mesosphere, test-fail > > PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails on CentOS 6.6: > {code} > [--] 1 test from PerfEventIsolatorTest > [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample > ../../src/tests/containerizer/isolator_tests.cpp:848: Failure > isolator: Perf is not supported > [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample (79 ms) > [--] 1 test from PerfEventIsolatorTest (79 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (86 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4353) Limit the number of processes created by libprocess
[ https://issues.apache.org/jira/browse/MESOS-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142699#comment-15142699 ] Maged Michael commented on MESOS-4353: -- Replying to Joris: > I don't think it makes sense to make this a maximum. Rather, it is just the > number of libprocess_worker_threads. My concern is that the number may be set to a very large value.How about we set a hardwired maximum value to limit the given value if it is too large? > Limit the number of processes created by libprocess > --- > > Key: MESOS-4353 > URL: https://issues.apache.org/jira/browse/MESOS-4353 > Project: Mesos > Issue Type: Improvement > Components: libprocess >Reporter: Qian Zhang >Assignee: Qian Zhang > > Currently libprocess will create {{max(8, number of CPU cores)}} processes > during the initialization, see > https://github.com/apache/mesos/blob/0.26.0/3rdparty/libprocess/src/process.cpp#L2146 > for details. This should be OK for a normal machine which has no much cores > (e.g., 16, 32), but for a powerful machine which may have a large number of > cores (e.g., an IBM Power machine may have 192 cores), this will cause too > much worker threads which are not necessary. > And since libprocess is widely used in Mesos (master, agent, scheduler, > executor), it may also cause some performance issue. For example, when user > creates a Docker container via Mesos in a Mesos agent which is running on a > powerful machine with 192 cores, the DockerContainerizer in Mesos agent will > create a dedicated executor for the container, and there will be 192 worker > threads in that executor. And if user creates 1000 Docker containers in that > machine, then there will be 1000 executors, i.e., 1000 * 192 worker threads > which is a large number and may thrash the OS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4647) Use in_memory as default registry when testing
[ https://issues.apache.org/jira/browse/MESOS-4647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-4647: Description: Currently, we use {{replicated_log}} as default registry when testing. This cause io operations when testings and slow down test cases. We should change it to use {{in_memory}} when testing and only use {{replicated_log}} when necessary. When testing this without sudo. Before {code} [--] Global test environment tear-down [==] 978 tests from 129 test cases ran. (678321 ms total) [ PASSED ] 978 tests. {code} After {code} [--] Global test environment tear-down [==] 978 tests from 129 test cases ran. (422265 ms total) [ PASSED ] 978 tests. {code} was:Currently, we use {{replicated_log}} as default registry when testing. This cause io operations when testings and slow down test cases. We should change it to use {{in_memory}} when testing and only use {{replicated_log}} when necessary. > Use in_memory as default registry when testing > -- > > Key: MESOS-4647 > URL: https://issues.apache.org/jira/browse/MESOS-4647 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent >Assignee: haosdent > > Currently, we use {{replicated_log}} as default registry when testing. This > cause io operations when testings and slow down test cases. We should change > it to use {{in_memory}} when testing and only use {{replicated_log}} when > necessary. > When testing this without sudo. > Before > {code} > [--] Global test environment tear-down > [==] 978 tests from 129 test cases ran. (678321 ms total) > [ PASSED ] 978 tests. > {code} > After > {code} > [--] Global test environment tear-down > [==] 978 tests from 129 test cases ran. (422265 ms total) > [ PASSED ] 978 tests. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.
[ https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142766#comment-15142766 ] Till Toenshoff commented on MESOS-4646: --- I now tried a 4.3 kernel. The results are just a bit better in that the kernel does not get stuck but the tests still fail utterly while getting stuck themselves. {noformat} [ RUN ] PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP I0211 05:56:11.255408 90890 port_mapping_tests.cpp:224] Using eth0 as the public interface I0211 05:56:11.255954 90890 port_mapping_tests.cpp:232] Using lo as the loopback interface I0211 05:56:13.144747 90890 port_mapping.cpp:1255] Using eth0 as the public interface I0211 05:56:13.145141 90890 port_mapping.cpp:1280] Using lo as the loopback interface I0211 05:56:13.146286 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024' I0211 05:56:13.146486 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128' I0211 05:56:13.146747 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_wmem = '4096 16384 4194304' I0211 05:56:13.147191 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_synack_retries = '5' I0211 05:56:13.147518 90890 port_mapping.cpp:1567] /proc/sys/net/core/somaxconn = '128' I0211 05:56:13.147707 90890 port_mapping.cpp:1567] /proc/sys/net/core/rmem_max = '212992' I0211 05:56:13.147971 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_rmem = '4096 87380 6291456' I0211 05:56:13.148393 90890 port_mapping.cpp:1567] /proc/sys/net/core/wmem_max = '212992' I0211 05:56:13.148653 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_keepalive_time = '7200' I0211 05:56:13.148808 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512' I0211 05:56:13.148962 90890 port_mapping.cpp:1567] /proc/sys/net/core/netdev_max_backlog = '1000' I0211 05:56:13.150074 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75' I0211 05:56:13.150271 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_keepalive_probes = '9' I0211 05:56:13.150394 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512' I0211 05:56:13.150619 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_retries2 = '15' I0211 05:56:17.074481 90890 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher I0211 05:56:17.078749 90909 port_mapping.cpp:2162] Using non-ephemeral ports {[31000,31500)} and ephemeral ports [30016,30032) for container container1 of executor '' I0211 05:56:17.334048 90890 linux_launcher.cpp:363] Cloning child process with flags = CLONE_NEWNET | CLONE_NEWNS ../../src/tests/containerizer/port_mapping_tests.cpp:507: Failure Failed to wait 15secs for isolator.get()->isolate(containerId1, pid.get()) I0211 05:56:34.901305 90907 port_mapping.cpp:2226] Bind mounted '/proc/90956/ns/net' to '/var/run/netns/90956' for container container1 [ FAILED ] PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP (29652 ms) [ RUN ] PortMappingIsolatorTest.ROOT_NC_ContainerToContainerUDP I0211 05:56:40.905812 90890 port_mapping_tests.cpp:224] Using eth0 as the public interface I0211 05:56:40.906904 90890 port_mapping_tests.cpp:232] Using lo as the loopback interface I0211 05:56:40.938251 90890 port_mapping.cpp:1255] Using eth0 as the public interface I0211 05:56:40.938639 90890 port_mapping.cpp:1280] Using lo as the loopback interface I0211 05:56:41.037220 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024' I0211 05:56:41.037513 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128' I0211 05:56:41.037768 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_wmem = '4096 16384 4194304' I0211 05:56:41.038230 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_synack_retries = '5' I0211 05:56:41.038434 90890 port_mapping.cpp:1567] /proc/sys/net/core/somaxconn = '128' I0211 05:56:41.038596 90890 port_mapping.cpp:1567] /proc/sys/net/core/rmem_max = '212992' I0211 05:56:41.051391 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_rmem = '4096 87380 6291456' I0211 05:56:41.051430 90890 port_mapping.cpp:1567] /proc/sys/net/core/wmem_max = '212992' I0211 05:56:41.051456 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_keepalive_time = '7200' I0211 05:56:41.051482 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512' I0211 05:56:41.051507 90890 port_mapping.cpp:1567] /proc/sys/net/core/netdev_max_backlog = '1000' I0211 05:56:41.051534 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75' I0211 05:56:41.051558 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_keepalive_probes = '9' I0211 05:56:41.051583 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512' I0211 05:56:41.051606 90890 port_mapping.cpp:1567] /proc/sys/net/ipv4/tcp_retries2 = '15'
[jira] [Commented] (MESOS-4519) configure.ac uses a mix of tabs and spaces indentation
[ https://issues.apache.org/jira/browse/MESOS-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142730#comment-15142730 ] Klaus Ma commented on MESOS-4519: - [~jvanremoortere], would you help to shepherd this? > configure.ac uses a mix of tabs and spaces indentation > -- > > Key: MESOS-4519 > URL: https://issues.apache.org/jira/browse/MESOS-4519 > Project: Mesos > Issue Type: Bug > Components: build >Reporter: Benjamin Bannier >Priority: Trivial > Labels: newbie > > configure.ac uses a mix of tabs and spaces for indention while only spaces > should be used. Replacing tabs with 8 spaces each appears to be safe and > seems to give the desired indention. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.
[ https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142782#comment-15142782 ] Till Toenshoff commented on MESOS-4646: --- Ow, after having left the machine in that state for a few minutes, at some point the kernel got stuck as well, even with 4.3. > PortMappingIsolatorTests get kernel stuck. > -- > > Key: MESOS-4646 > URL: https://issues.apache.org/jira/browse/MESOS-4646 > Project: Mesos > Issue Type: Bug > Environment: Linux Kernel 3.19.9-49-generic, > libnl-3.2.27 >Reporter: Till Toenshoff >Assignee: Cong Wang > > {noformat} > $ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*" > Source directory: /home/till/scratchpad/mesos > Build directory: /home/till/scratchpad/mesos/build > - > We cannot run any cgroups tests that require mounting > hierarchies because you have the following hierarchies mounted: > /sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, > /sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, > /sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, > /sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd > We'll disable the CgroupsNoHierarchyTest test fixture for now. > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > No 'perf' command found so no 'perf' tests will be run > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > The 'perf' command wasn't found so tests using it > to sample the 'cycles' hardware event will not be run. > - > /bin/nc > /usr/local/bin/curl > Note: Google Test filter = >
[jira] [Commented] (MESOS-4643) PortMappingIsolatorTest fail when no namespaces are set.
[ https://issues.apache.org/jira/browse/MESOS-4643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142770#comment-15142770 ] Till Toenshoff commented on MESOS-4643: --- My workaround was to simply add a namespace before running the test-suite: {{sudo ip netns add foo}} > PortMappingIsolatorTest fail when no namespaces are set. > > > Key: MESOS-4643 > URL: https://issues.apache.org/jira/browse/MESOS-4643 > Project: Mesos > Issue Type: Bug > Environment: Linux Kernel 3.19.0-49-generic, > libnl-3.2.27 >Reporter: Till Toenshoff >Priority: Minor > > Currently our network isolator tests fail with the following output on a > Ubuntu 14.04 VM. > {noformat} > [02:10:15][Step 8/8] [ RUN ] > PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP > [02:10:15][Step 8/8] > ../../src/tests/containerizer/port_mapping_tests.cpp:164: Failure > [02:10:15][Step 8/8] entries: Failed to opendir '/var/run/netns': No such > file or directory > [02:10:15][Step 8/8] > ../../src/tests/containerizer/port_mapping_tests.cpp:164: Failure > [02:10:15][Step 8/8] entries: Failed to opendir '/var/run/netns': No such > file or directory > [02:10:15][Step 8/8] [ FAILED ] > PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP (4 ms) > {noformat} > The machine has no network namespaces set, hence {{/var/run/netns}} does not > exist. > We should help users understanding this prerequisite or maybe even get these > things in a fixture. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4657) Add LOG(INFO) in `cgroups/net_cls` for debugging allocation of net_cls handles.
Avinash Sridharan created MESOS-4657: Summary: Add LOG(INFO) in `cgroups/net_cls` for debugging allocation of net_cls handles. Key: MESOS-4657 URL: https://issues.apache.org/jira/browse/MESOS-4657 Project: Mesos Issue Type: Improvement Components: containerization Environment: Linux Reporter: Avinash Sridharan Assignee: Avinash Sridharan Priority: Minor We need to add LOG(INFO) during the prepare phase of `cgroups/net_cls` for debugging management of `net_cls` handles within the isolator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4479) Implement reservation labels
[ https://issues.apache.org/jira/browse/MESOS-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143764#comment-15143764 ] Michael Park commented on MESOS-4479: - {noformat} commit 3b02b80fae886caccd242f5fc205e91a42723861 Author: Neil ConwayDate: Thu Feb 11 16:07:05 2016 -0800 Added documentation for labeled reserved resources. Review: https://reviews.apache.org/r/42755/ {noformat} {noformat} commit 77448c0bda4109ceb0c2aadbb5d240faa12b1f3e Author: Neil Conway Date: Thu Feb 11 15:56:39 2016 -0800 Added support for labels to resource reservations. Labels are free-form key-value pairs that can be used to associate metadata with reserved resources. Review: https://reviews.apache.org/r/42754/ {noformat} > Implement reservation labels > > > Key: MESOS-4479 > URL: https://issues.apache.org/jira/browse/MESOS-4479 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: labels, mesosphere, reservations > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4661) SlaveRecoveryTest/0.ReconnectHTTPExecutor is flaky
Anand Mazumdar created MESOS-4661: - Summary: SlaveRecoveryTest/0.ReconnectHTTPExecutor is flaky Key: MESOS-4661 URL: https://issues.apache.org/jira/browse/MESOS-4661 Project: Mesos Issue Type: Bug Reporter: Anand Mazumdar Showed up on ASF CI: https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/1660/consoleFull {code} [ RUN ] SlaveRecoveryTest/0.ReconnectHTTPExecutor I0212 00:23:08.177824 702 leveldb.cpp:174] Opened db in 2.499462ms I0212 00:23:08.179204 702 leveldb.cpp:181] Compacted db in 1.206514ms I0212 00:23:08.179400 702 leveldb.cpp:196] Created db iterator in 36168ns I0212 00:23:08.179538 702 leveldb.cpp:202] Seeked to beginning of db in 2343ns I0212 00:23:08.179651 702 leveldb.cpp:271] Iterated through 0 keys in the db in 471ns I0212 00:23:08.179816 702 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0212 00:23:08.180547 736 recover.cpp:447] Starting replica recovery I0212 00:23:08.181025 736 recover.cpp:473] Replica is in EMPTY status I0212 00:23:08.182406 722 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (9452)@172.17.0.2:57200 I0212 00:23:08.182624 724 recover.cpp:193] Received a recover response from a replica in EMPTY status I0212 00:23:08.183368 736 recover.cpp:564] Updating replica status to STARTING I0212 00:23:08.184329 730 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 726589ns I0212 00:23:08.184361 730 replica.cpp:320] Persisted replica status to STARTING I0212 00:23:08.184501 722 recover.cpp:473] Replica is in STARTING status I0212 00:23:08.186000 733 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (9453)@172.17.0.2:57200 I0212 00:23:08.186311 735 recover.cpp:193] Received a recover response from a replica in STARTING status I0212 00:23:08.186650 724 recover.cpp:564] Updating replica status to VOTING I0212 00:23:08.186785 727 master.cpp:376] Master 6508f198-e145-4d76-844f-0460dc5d7d39 (ca60addecc0b) started on 172.17.0.2:57200 I0212 00:23:08.186808 727 master.cpp:378] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/9KHFn8/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" --work_dir="/tmp/9KHFn8/master" --zk_session_timeout="10secs" I0212 00:23:08.187353 727 master.cpp:423] Master only allowing authenticated frameworks to register I0212 00:23:08.187366 727 master.cpp:428] Master only allowing authenticated slaves to register I0212 00:23:08.187376 727 credentials.hpp:35] Loading credentials for authentication from '/tmp/9KHFn8/credentials' I0212 00:23:08.187533 724 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 460382ns I0212 00:23:08.187676 724 replica.cpp:320] Persisted replica status to VOTING I0212 00:23:08.187770 727 master.cpp:468] Using default 'crammd5' authenticator I0212 00:23:08.188096 727 master.cpp:537] Using default 'basic' HTTP authenticator I0212 00:23:08.188344 727 master.cpp:571] Authorization enabled I0212 00:23:08.188544 728 recover.cpp:578] Successfully joined the Paxos group I0212 00:23:08.189209 722 hierarchical.cpp:144] Initialized hierarchical allocator process I0212 00:23:08.189337 731 whitelist_watcher.cpp:77] No whitelist given I0212 00:23:08.189357 728 recover.cpp:462] Recover process terminated I0212 00:23:08.192903 733 master.cpp:1712] The newly elected leader is master@172.17.0.2:57200 with id 6508f198-e145-4d76-844f-0460dc5d7d39 I0212 00:23:08.192940 733 master.cpp:1725] Elected as the leading master! I0212 00:23:08.193133 733 master.cpp:1470] Recovering from registrar I0212 00:23:08.193269 734 registrar.cpp:307] Recovering registrar I0212 00:23:08.194031 734 log.cpp:659] Attempting to start the writer I0212 00:23:08.195296 730 replica.cpp:493] Replica received implicit promise request from (9455)@172.17.0.2:57200 with proposal 1 I0212 00:23:08.196018 730
[jira] [Created] (MESOS-4662) PortMapping network isolator should not assume BIND_MOUNT_ROOT is a realpath.
Jie Yu created MESOS-4662: - Summary: PortMapping network isolator should not assume BIND_MOUNT_ROOT is a realpath. Key: MESOS-4662 URL: https://issues.apache.org/jira/browse/MESOS-4662 Project: Mesos Issue Type: Bug Affects Versions: 0.25.0, 0.26.0, 0.27.0 Reporter: Jie Yu On some newer linux distributions, /var/run is a symlink to /run. The port mapping isolator assumes that PORT_MAPPING_BIND_MOUNT_ROOT is a realpath (exists in the mount table), which obviously is not true on those systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4439) Fix appc CachedImage image validation
[ https://issues.apache.org/jira/browse/MESOS-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4439: -- Story Points: 1 (was: 2) > Fix appc CachedImage image validation > - > > Key: MESOS-4439 > URL: https://issues.apache.org/jira/browse/MESOS-4439 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Jojy Varghese >Assignee: Jojy Varghese > Labels: mesosphere, unified-containerizer-mvp > Fix For: 0.28.0 > > > Currently image validation is done assuming that the image's filename will > have digest (SHA-512) information. This is not part of the spec > (https://github.com/appc/spec/blob/master/spec/discovery.md). > > The spec specifies the tuple as unique identifier > for discovering an image. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4658) process::Connection can lead to deadlock around execution in the same context.
[ https://issues.apache.org/jira/browse/MESOS-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Lin reassigned MESOS-4658: Assignee: Shuai Lin > process::Connection can lead to deadlock around execution in the same context. > -- > > Key: MESOS-4658 > URL: https://issues.apache.org/jira/browse/MESOS-4658 > Project: Mesos > Issue Type: Bug > Components: HTTP API, libprocess >Reporter: Anand Mazumdar >Assignee: Shuai Lin > Labels: mesosphere > > The {{Connection}} abstraction is prone to deadlocks arising from the object > being destroyed inside the same execution context. > Consider this example: > {code} > Option connection = process::http::connect(...); > connection.disconnected() > .onAny(defer(self(), , connection)); > connection.disconnect(); > connection = None(); > {code} > In the above snippet, if the {{connection = None()}} gets executed first > before the actual dispatch to {{ConnectionProcess}} happens. You might loose > the only existing reference to {{Connection}} object inside > {{ConnectionProcess::disconnect}}. This would lead to the destruction of the > {{Connection}} object in the {{ConnectionProcess}} execution context. > We do have a snippet in our existing code that alludes to such occurrences > happening: > https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/http.cpp#L1325 > {code} > // This is a one time request which will close the connection when > // the response is received. Since 'Connection' is reference-counted, > // we must keep a copy around until the disconnection occurs. Note > // that in order to avoid a deadlock (Connection destruction occurring > // from the ConnectionProcess execution context), we use 'async'. > {code} > AFAICT, for scenarios where we need to hold on to the {{Connection}} object > for later, this approach does not suffice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2971) Implement OverlayFS based provisioner backend
[ https://issues.apache.org/jira/browse/MESOS-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143721#comment-15143721 ] Mei Wan commented on MESOS-2971: Hi Shuai, I have a reviewboard still under review https://reviews.apache.org/r/37853/ but I haven't had much time to look at it. Feel free to take a look or start afresh! > Implement OverlayFS based provisioner backend > - > > Key: MESOS-2971 > URL: https://issues.apache.org/jira/browse/MESOS-2971 > Project: Mesos > Issue Type: Improvement >Reporter: Timothy Chen >Assignee: Mei Wan > Labels: mesosphere, twitter, unified-containerizer-mvp > > Part of the image provisioning process is to call a backend to create a root > filesystem based on the image on disk layout. > The problem with the copy backend is that it's both waste of IO and space, > and bind only can deal with one layer. > Overlayfs backend allows us to utilize the filesystem to merge multiple > filesystems into one efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4596) Add common Appc spec utilities.
[ https://issues.apache.org/jira/browse/MESOS-4596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4596: -- Story Points: 2 (was: 3) > Add common Appc spec utilities. > --- > > Key: MESOS-4596 > URL: https://issues.apache.org/jira/browse/MESOS-4596 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Jojy Varghese >Assignee: Jojy Varghese > Labels: mesosphere, unified-containerizer-mvp > Fix For: 0.28.0 > > > Add common utility functions such as : > - validating image information against actual data in the image > directory. > - getting list of dependencies at depth 1 for an image. > - getting image path simple image discovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2971) Implement OverlayFS based provisioner backend
[ https://issues.apache.org/jira/browse/MESOS-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143713#comment-15143713 ] Shuai Lin commented on MESOS-2971: -- Hi all, What's the status of this ticket? [~mwan] Can I take it if you're not working on it recently? > Implement OverlayFS based provisioner backend > - > > Key: MESOS-2971 > URL: https://issues.apache.org/jira/browse/MESOS-2971 > Project: Mesos > Issue Type: Improvement >Reporter: Timothy Chen >Assignee: Mei Wan > Labels: mesosphere, twitter, unified-containerizer-mvp > > Part of the image provisioning process is to call a backend to create a root > filesystem based on the image on disk layout. > The problem with the copy backend is that it's both waste of IO and space, > and bind only can deal with one layer. > Overlayfs backend allows us to utilize the filesystem to merge multiple > filesystems into one efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4164) MasterTest.RecoverResources is slow
[ https://issues.apache.org/jira/browse/MESOS-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-4164: --- Assignee: haosdent > MasterTest.RecoverResources is slow > --- > > Key: MESOS-4164 > URL: https://issues.apache.org/jira/browse/MESOS-4164 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{MasterTest.RecoverResources}} test takes more than {{1s}} to finish on > my Mac OS 10.10.4: > {code} > MasterTest.RecoverResources (1018 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4173) HealthCheckTest.CheckCommandTimeout is slow
[ https://issues.apache.org/jira/browse/MESOS-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-4173: Assignee: Timothy Chen (was: haosdent) > HealthCheckTest.CheckCommandTimeout is slow > --- > > Key: MESOS-4173 > URL: https://issues.apache.org/jira/browse/MESOS-4173 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: Timothy Chen >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{HealthCheckTest.CheckCommandTimeout}} test takes more than {{15s}}! to > finish on my Mac OS 10.10.4: > {code} > HealthCheckTest.CheckCommandTimeout (15483 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4165) MasterTest.MasterInfoOnReElection is slow
[ https://issues.apache.org/jira/browse/MESOS-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-4165: --- Assignee: haosdent > MasterTest.MasterInfoOnReElection is slow > - > > Key: MESOS-4165 > URL: https://issues.apache.org/jira/browse/MESOS-4165 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{MasterTest.MasterInfoOnReElection}} test takes more than {{1s}} to > finish on my Mac OS 10.10.4: > {code} > MasterTest.MasterInfoOnReElection (1024 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4170) OversubscriptionTest.UpdateAllocatorOnSchedulerFailover is slow
[ https://issues.apache.org/jira/browse/MESOS-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-4170: --- Assignee: haosdent > OversubscriptionTest.UpdateAllocatorOnSchedulerFailover is slow > --- > > Key: MESOS-4170 > URL: https://issues.apache.org/jira/browse/MESOS-4170 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{OversubscriptionTest.UpdateAllocatorOnSchedulerFailover}} test takes > more than {{1s}} to finish on my Mac OS 10.10.4: > {code} > OversubscriptionTest.UpdateAllocatorOnSchedulerFailover (1018 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4172) GarbageCollectorIntegrationTest.Restart is slow
[ https://issues.apache.org/jira/browse/MESOS-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-4172: --- Assignee: haosdent > GarbageCollectorIntegrationTest.Restart is slow > --- > > Key: MESOS-4172 > URL: https://issues.apache.org/jira/browse/MESOS-4172 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{GarbageCollectorIntegrationTest.Restart}} test takes more than {{5s}} > to finish on my Mac OS 10.10.4: > {code} > GarbageCollectorIntegrationTest.Restart (5102 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4168) MasterMaintenanceTest.EnterMaintenanceMode is slow
[ https://issues.apache.org/jira/browse/MESOS-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-4168: --- Assignee: haosdent > MasterMaintenanceTest.EnterMaintenanceMode is slow > --- > > Key: MESOS-4168 > URL: https://issues.apache.org/jira/browse/MESOS-4168 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{MasterMaintenanceTest.EnterMaintenanceMode}} test takes more than > {{5s}} to finish on my Mac OS 10.10.4: > {code} > MasterMaintenanceTest.EnterMaintenanceMode (5087 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4171) OversubscriptionTest.RemoveCapabilitiesOnSchedulerFailover is slow
[ https://issues.apache.org/jira/browse/MESOS-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-4171: --- Assignee: haosdent > OversubscriptionTest.RemoveCapabilitiesOnSchedulerFailover is slow > -- > > Key: MESOS-4171 > URL: https://issues.apache.org/jira/browse/MESOS-4171 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{OversubscriptionTest.RemoveCapabilitiesOnSchedulerFailover}} test takes > more than {{1s}} to finish on my Mac OS 10.10.4: > {code} > OversubscriptionTest.RemoveCapabilitiesOnSchedulerFailover (1018 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4167) MasterTest.OfferTimeout is slow
[ https://issues.apache.org/jira/browse/MESOS-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-4167: --- Assignee: haosdent > MasterTest.OfferTimeout is slow > --- > > Key: MESOS-4167 > URL: https://issues.apache.org/jira/browse/MESOS-4167 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{MasterTest.OfferTimeout}} test takes more than {{1s}} to finish on my > Mac OS 10.10.4: > {code} > MasterTest.OfferTimeout (1053 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4166) MasterTest.LaunchCombinedOfferTest is slow
[ https://issues.apache.org/jira/browse/MESOS-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-4166: --- Assignee: haosdent > MasterTest.LaunchCombinedOfferTest is slow > -- > > Key: MESOS-4166 > URL: https://issues.apache.org/jira/browse/MESOS-4166 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{MasterTest.LaunchCombinedOfferTest}} test takes more than {{2s}} to > finish on my Mac OS 10.10.4: > {code} > MasterTest.LaunchCombinedOfferTest (2023 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4173) HealthCheckTest.CheckCommandTimeout is slow
[ https://issues.apache.org/jira/browse/MESOS-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-4173: --- Assignee: haosdent > HealthCheckTest.CheckCommandTimeout is slow > --- > > Key: MESOS-4173 > URL: https://issues.apache.org/jira/browse/MESOS-4173 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{HealthCheckTest.CheckCommandTimeout}} test takes more than {{15s}}! to > finish on my Mac OS 10.10.4: > {code} > HealthCheckTest.CheckCommandTimeout (15483 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4169) MasterMaintenanceTest.InverseOffers is slow
[ https://issues.apache.org/jira/browse/MESOS-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-4169: --- Assignee: haosdent > MasterMaintenanceTest.InverseOffers is slow > --- > > Key: MESOS-4169 > URL: https://issues.apache.org/jira/browse/MESOS-4169 > Project: Mesos > Issue Type: Improvement > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > The {{MasterMaintenanceTest.InverseOffers}} test takes more than {{2s}} to > finish on my Mac OS 10.10.4: > {code} > MasterMaintenanceTest.InverseOffers (2027 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-297) Speed up the slow running tests.
[ https://issues.apache.org/jira/browse/MESOS-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-297: --- Assignee: Benjamin Mahler > Speed up the slow running tests. > > > Key: MESOS-297 > URL: https://issues.apache.org/jira/browse/MESOS-297 > Project: Mesos > Issue Type: Improvement >Reporter: Benjamin Mahler >Assignee: Benjamin Mahler >Priority: Minor > > The tests currently take 70 seconds on my machine: > [==] 200 tests from 37 test cases ran. (68963 ms total) > There are some major culprits: > [--] 12 tests from ZooKeeperTest (27484 ms total) > [--] 5 tests from SampleFrameworks (12529 ms total) > [--] 8 tests from ResourceOffersTest (4166 ms total) > [--] 2 tests from AllocatorZooKeeperTest/0 (4128 ms total) > [--] 3 tests from GarbageCollectorTest (3117 ms total) > Hopefully there are some quick gains to be had. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container
[ https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144020#comment-15144020 ] Evan Krall commented on MESOS-3738: --- Any chance we could get that patch applied and a version 0.23.2, 0.24.2, 0.25.2 released? > Mesos health check is invoked incorrectly when Mesos slave is within the > docker container > - > > Key: MESOS-3738 > URL: https://issues.apache.org/jira/browse/MESOS-3738 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.25.0 > Environment: Docker 1.8.0: > Client: > Version: 1.8.0 > API version: 1.20 > Go version: go1.4.2 > Git commit: 0d03096 > Built:Tue Aug 11 16:48:39 UTC 2015 > OS/Arch: linux/amd64 > Server: > Version: 1.8.0 > API version: 1.20 > Go version: go1.4.2 > Git commit: 0d03096 > Built:Tue Aug 11 16:48:39 UTC 2015 > OS/Arch: linux/amd64 > Host: Ubuntu 14.04 > Container: Debian 8.1 + Java-7 >Reporter: Yong Tang >Assignee: haosdent > Fix For: 0.26.0 > > Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, > MESOS-3738-0_25_0.patch > > > When Mesos slave is within the container, the COMMAND health check from > Marathon is invoked incorrectly. > In such a scenario, the sandbox directory (instead of the > launcher/health-check directory) is used. This result in an error with the > container. > Command to invoke the Mesos slave container: > {noformat} > sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v > /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro > -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos > mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos > --executor_registration_timeout=5mins --docker_stop_timeout=10secs > --launcher=posix > {noformat} > Marathon JSON file: > {code} > { > "id": "ubuntu", > "container": > { > "type": "DOCKER", > "docker": > { > "image": "ubuntu", > "network": "BRIDGE", > "parameters": [] > } > }, > "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ], > "uris": [], > "healthChecks": > [ > { > "protocol": "COMMAND", > "command": { "value": "echo Success" }, > "gracePeriodSeconds": 3000, > "intervalSeconds": 5, > "timeoutSeconds": 5, > "maxConsecutiveFailures": 300 > } > ], > "instances": 1 > } > {code} > {noformat} > STDOUT: > root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout > --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" > --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" > --mapped_directory="/mnt/mesos/sandbox" --quiet="false" > --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --stop_timeout="10secs" > --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" > --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" > --mapped_directory="/mnt/mesos/sandbox" --quiet="false" > --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --stop_timeout="10secs" > Registered docker executor on b01e2e75afcb > Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106 > 1 > Launching health check process: > /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check > --executor=(1)@10.2.1.7:40695 > --health_check_json={"command":{"shell":true,"value":"docker exec > mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f > sh -c \" echo Success > \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0} > --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106 > Health check process launched at pid: 94 > 1 > 1 > 1 > 1 > 1 > STDERR: > root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr > I1014 23:15:58.12795056 exec.cpp:134] Version: 0.25.0 > I1014 23:15:58.13062762 exec.cpp:208] Executor registered on
[jira] [Commented] (MESOS-4653) Unify test case temporary folder name format
[ https://issues.apache.org/jira/browse/MESOS-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142981#comment-15142981 ] Joseph Wu commented on MESOS-4653: -- [~haosd...@gmail.com] If you'd like to work on this, feel free to take [MESOS-3848], which is already scoped. I believe [~jieyu] should be willing to shepherd (but confirm that he has cycles first). > Unify test case temporary folder name format > > > Key: MESOS-4653 > URL: https://issues.apache.org/jira/browse/MESOS-4653 > Project: Mesos > Issue Type: Improvement > Components: test >Reporter: haosdent >Assignee: haosdent >Priority: Minor > Labels: test > > In > [environment.cpp#L759https://github.com/apache/mesos/blob/master/src/tests/environment.cpp#L759] > {code} > const string& path = > path::join("/tmp", strings::join("_", testCase, testName, "XX")); > {code} > The temporary file format here is {{testCase_testName_xx}} here. > But in > [utils.hpp#L37|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/tests/utils.hpp#L37] > {code} > // Create a temporary directory for the test. > Try directory = os::mkdtemp(); > {code} > The temporary folder we create here is {{xx}}. I think it would be better > we could unify this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3848) Refactor Environment::mkdtemp into TemporaryDirectoryTest.
[ https://issues.apache.org/jira/browse/MESOS-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143001#comment-15143001 ] haosdent commented on MESOS-3848: - {quote} Move the temporary directory logic from Environment::mkdtemp to TemporaryDirectoryTest. {quote} +1 And does this mean we could call multiple times mkdtemp in {{TemporaryDirectoryTest}} and destroy them in {{TemporaryDirectoryTest::TearDown}}. Just as what we do now in {{Environment::TearDown}}? And I saw * process_tests.cpp * subprocess_tests.cpp * zookeeper_test_server.cpp still use os::mkdtemp. I think it would be better change them use the dir created by {{TemporaryDirectoryTest}}. > Refactor Environment::mkdtemp into TemporaryDirectoryTest. > -- > > Key: MESOS-3848 > URL: https://issues.apache.org/jira/browse/MESOS-3848 > Project: Mesos > Issue Type: Task > Components: test >Reporter: Joseph Wu >Assignee: Joseph Wu >Priority: Minor > Labels: mesosphere > > As part of [MESOS-3762], many tests were changed from one > {{TemporaryDirectoryTest}} to another {{TemporaryDirectoryTest}}. One subtle > difference is that the name of the temporary directory no longer contains the > name of the test. In [MESOS-3847], the duplicate {{TemporaryDirectoryTest}} > was removed. > The original {{TemporaryDirectoryTest}} called > [{{environment->mkdtemp}}|https://github.com/apache/mesos/blob/master/src/tests/environment.cpp#L494]. > We would like the naming, which is valuable for debugging, to be available > for a majority of tests. (A majority of tests inherit from > {{TemporaryDirectoryTest}} in some way.) > Note: > * Any additional directories created via {{environment->mkdtemp}} are cleaned > up after the test. > * We don't want mesos-specific logic in Stout, like the {{umount}} shell > command in {{Environment::TearDown}}. > *Proposed change:* > Move the temporary directory logic from {{Environment::mkdtemp}} to > {{TemporaryDirectoryTest}}. > *Tests that need to change* > | {{log_tests.cpp}} | {{LogZooKeeperTest}} | We can change {{ZooKeeperTest}} > to inherit from {{TemporaryDirectoryTest}} to get rid of code duplication | > | {{tests/mesos.cpp}} | {{MesosTest::CreateSlaveFlags}} | {{MesosTest}} > already inherits from {{TemporaryDirectoryTest}}. | > | {{tests/script.hpp}} | {{TEST_SCRIPT}} | This is used for the > {{ExampleTests}}. We can define a test class that inherits appropriately. | > | {{docker_tests.cpp}} | {{*}} | Already inherits from {{MesosTest}}. | -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3848) Refactor Environment::mkdtemp into TemporaryDirectoryTest.
[ https://issues.apache.org/jira/browse/MESOS-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143027#comment-15143027 ] haosdent commented on MESOS-3848: - Got it. Thank you. > Refactor Environment::mkdtemp into TemporaryDirectoryTest. > -- > > Key: MESOS-3848 > URL: https://issues.apache.org/jira/browse/MESOS-3848 > Project: Mesos > Issue Type: Task > Components: test >Reporter: Joseph Wu >Assignee: Joseph Wu >Priority: Minor > Labels: mesosphere > > As part of [MESOS-3762], many tests were changed from one > {{TemporaryDirectoryTest}} to another {{TemporaryDirectoryTest}}. One subtle > difference is that the name of the temporary directory no longer contains the > name of the test. In [MESOS-3847], the duplicate {{TemporaryDirectoryTest}} > was removed. > The original {{TemporaryDirectoryTest}} called > [{{environment->mkdtemp}}|https://github.com/apache/mesos/blob/master/src/tests/environment.cpp#L494]. > We would like the naming, which is valuable for debugging, to be available > for a majority of tests. (A majority of tests inherit from > {{TemporaryDirectoryTest}} in some way.) > Note: > * Any additional directories created via {{environment->mkdtemp}} are cleaned > up after the test. > * We don't want mesos-specific logic in Stout, like the {{umount}} shell > command in {{Environment::TearDown}}. > *Proposed change:* > Move the temporary directory logic from {{Environment::mkdtemp}} to > {{TemporaryDirectoryTest}}. > *Tests that need to change* > | {{log_tests.cpp}} | {{LogZooKeeperTest}} | We can change {{ZooKeeperTest}} > to inherit from {{TemporaryDirectoryTest}} to get rid of code duplication | > | {{tests/mesos.cpp}} | {{MesosTest::CreateSlaveFlags}} | {{MesosTest}} > already inherits from {{TemporaryDirectoryTest}}. | > | {{tests/script.hpp}} | {{TEST_SCRIPT}} | This is used for the > {{ExampleTests}}. We can define a test class that inherits appropriately. | > | {{docker_tests.cpp}} | {{*}} | Already inherits from {{MesosTest}}. | -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4547) Introduce TASK_KILLING state.
[ https://issues.apache.org/jira/browse/MESOS-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143070#comment-15143070 ] Abhishek Dasgupta commented on MESOS-4547: -- Please find the patches at: https://reviews.apache.org/r/43487/ https://reviews.apache.org/r/43488/ https://reviews.apache.org/r/43489/ https://reviews.apache.org/r/43490/ > Introduce TASK_KILLING state. > - > > Key: MESOS-4547 > URL: https://issues.apache.org/jira/browse/MESOS-4547 > Project: Mesos > Issue Type: Improvement >Reporter: Benjamin Mahler >Assignee: Abhishek Dasgupta > Labels: mesosphere > > Currently there is no state to express that a task is being killed, but is > not yet killed (see MESOS-4140). In a similar way to how we have > TASK_STARTING to indicate the task is starting but not yet running, a > TASK_KILLING state would indicate the task is being killed but is not yet > killed. > This would need to be guarded by a framework capability to protect old > frameworks that cannot understand the TASK_KILLING state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3296) Failing ROOT_ tests on CentOS 7.1 - LinuxFilesystemIsolatorTest
[ https://issues.apache.org/jira/browse/MESOS-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143071#comment-15143071 ] haosdent commented on MESOS-3296: - {code} [ RUN ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem I0212 01:11:25.390995 25282 linux.cpp:81] Making '/tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_f15DnB' a shared mount I0212 01:11:25.402125 25282 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher I0212 01:11:25.404479 25282 systemd.cpp:223] systemd version `219` detected I0212 01:11:25.405414 25303 containerizer.cpp:666] Starting container '720db9f5-14b8-4c1f-9d1c-1ad52a1ae3d7' for executor 'test_executor' of framework '' I0212 01:11:25.407177 25299 provisioner.cpp:285] Provisioning image rootfs '/tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_f15DnB/provisioner/containers/720db9f5-14b8-4c1f-9d1c-1ad52a1ae3d7/backends/copy/rootfses/b121c623-51db-4ab3-8daf-de3aae6c56d6' for container 720db9f5-14b8-4c1f-9d1c-1ad52a1ae3d7 I0212 01:11:28.773602 25299 linux.cpp:306] Bind mounting work directory from '/tmp/Pe7dyr/sandbox' to '/tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_f15DnB/provisioner/containers/720db9f5-14b8-4c1f-9d1c-1ad52a1ae3d7/backends/copy/rootfses/b121c623-51db-4ab3-8daf-de3aae6c56d6/mnt/mesos/sandbox' for container 720db9f5-14b8-4c1f-9d1c-1ad52a1ae3d7 I0212 01:11:28.778959 25302 linux_launcher.cpp:304] Cloning child process with flags = CLONE_NEWNS + /home/haosdent/mesos/build/src/mesos-containerizer mount --help=false --operation=make-rslave --path=/ + grep -E /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_f15DnB/.+ /proc/self/mountinfo + grep -v 720db9f5-14b8-4c1f-9d1c-1ad52a1ae3d7 + cut '-d ' -f5 + xargs --no-run-if-empty umount -l Changing root to /tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_f15DnB/provisioner/containers/720db9f5-14b8-4c1f-9d1c-1ad52a1ae3d7/backends/copy/rootfses/b121c623-51db-4ab3-8daf-de3aae6c56d6 I0212 01:11:28.972585 25299 containerizer.cpp:1585] Executor for container '720db9f5-14b8-4c1f-9d1c-1ad52a1ae3d7' has exited I0212 01:11:28.972681 25299 containerizer.cpp:1369] Destroying container '720db9f5-14b8-4c1f-9d1c-1ad52a1ae3d7' I0212 01:11:28.977098 25297 cgroups.cpp:2427] Freezing cgroup /sys/fs/cgroup/freezer/mesos/720db9f5-14b8-4c1f-9d1c-1ad52a1ae3d7 I0212 01:11:28.980403 25296 cgroups.cpp:1409] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/720db9f5-14b8-4c1f-9d1c-1ad52a1ae3d7 after 3.208704ms I0212 01:11:28.983417 25298 cgroups.cpp:2445] Thawing cgroup /sys/fs/cgroup/freezer/mesos/720db9f5-14b8-4c1f-9d1c-1ad52a1ae3d7 I0212 01:11:28.986616 25296 cgroups.cpp:1438] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/720db9f5-14b8-4c1f-9d1c-1ad52a1ae3d7 after 3.096832ms I0212 01:11:28.990787 25303 linux.cpp:768] Unmounting sandbox/work directory '/tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_f15DnB/provisioner/containers/720db9f5-14b8-4c1f-9d1c-1ad52a1ae3d7/ backends/copy/rootfses/b121c623-51db-4ab3-8daf-de3aae6c56d6/mnt/mesos/sandbox' for container 720db9f5-14b8-4c1f-9d1c-1ad52a1ae3d7 I0212 01:11:28.991528 25300 provisioner.cpp:330] Destroying container rootfs at '/tmp/LinuxFilesystemIsolatorTest_ROOT_ChangeRootFilesystem_f15DnB/provisioner/containers/720db9f5-14b8-4c1f-9d1c-1ad52a1ae3 d7/backends/copy/rootfses/b121c623-51db-4ab3-8daf-de3aae6c56d6' for container 720db9f5-14b8-4c1f-9d1c-1ad52a1ae3d7 ../../src/tests/containerizer/filesystem_isolator_tests.cpp:284: Failure Failed to wait 15secs for wait [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem (43267 ms) [--] 1 test from LinuxFilesystemIsolatorTest (43267 ms total) [--] Global test environment tear-down ../../src/tests/environment.cpp:728: Failure Failed Tests completed with child processes remaining: -+- 25282 /home/haosdent/mesos/build/src/.libs/lt-mesos-tests --gtest_filter=LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem --verbose \--- 25367 () [==] 1 test from 1 test case ran. (43468 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem {code} I try updated systemd, but not works. {code} # rpm -qa|grep systemd systemd-219-19.el7.x86_64 systemd-sysv-219-19.el7.x86_64 systemd-libs-219-19.el7.x86_64 {code} > Failing ROOT_ tests on CentOS 7.1 - LinuxFilesystemIsolatorTest > --- > > Key: MESOS-3296 > URL: https://issues.apache.org/jira/browse/MESOS-3296 > Project: Mesos > Issue Type: Bug > Components: containerization, docker, test >Affects Versions: 0.23.0, 0.24.0 > Environment: CentOS Linux release 7.1 > Linux 3.10.0 >Reporter: Marco Massenzio >Assignee: Greg Mann >
[jira] [Updated] (MESOS-4657) Add LOG(INFO) in `cgroups/net_cls` for debugging allocation of net_cls handles.
[ https://issues.apache.org/jira/browse/MESOS-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4657: -- Sprint: Mesosphere Sprint 29 > Add LOG(INFO) in `cgroups/net_cls` for debugging allocation of net_cls > handles. > --- > > Key: MESOS-4657 > URL: https://issues.apache.org/jira/browse/MESOS-4657 > Project: Mesos > Issue Type: Improvement > Components: containerization > Environment: Linux >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan >Priority: Minor > Labels: mesosphere > > We need to add LOG(INFO) during the prepare phase of `cgroups/net_cls` for > debugging management of `net_cls` handles within the isolator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4657) Add LOG(INFO) in `cgroups/net_cls` for debugging allocation of net_cls handles.
[ https://issues.apache.org/jira/browse/MESOS-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4657: -- Sprint: Mesosphere Sprint 28 (was: Mesosphere Sprint 29) > Add LOG(INFO) in `cgroups/net_cls` for debugging allocation of net_cls > handles. > --- > > Key: MESOS-4657 > URL: https://issues.apache.org/jira/browse/MESOS-4657 > Project: Mesos > Issue Type: Improvement > Components: containerization > Environment: Linux >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan >Priority: Minor > Labels: mesosphere > > We need to add LOG(INFO) during the prepare phase of `cgroups/net_cls` for > debugging management of `net_cls` handles within the isolator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4653) Unify test case temporary folder name format
[ https://issues.apache.org/jira/browse/MESOS-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142984#comment-15142984 ] haosdent commented on MESOS-4653: - Thank you very much. Let me close this. > Unify test case temporary folder name format > > > Key: MESOS-4653 > URL: https://issues.apache.org/jira/browse/MESOS-4653 > Project: Mesos > Issue Type: Improvement > Components: test >Reporter: haosdent >Assignee: haosdent >Priority: Minor > Labels: test > > In > [environment.cpp#L759https://github.com/apache/mesos/blob/master/src/tests/environment.cpp#L759] > {code} > const string& path = > path::join("/tmp", strings::join("_", testCase, testName, "XX")); > {code} > The temporary file format here is {{testCase_testName_xx}} here. > But in > [utils.hpp#L37|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/tests/utils.hpp#L37] > {code} > // Create a temporary directory for the test. > Try directory = os::mkdtemp(); > {code} > The temporary folder we create here is {{xx}}. I think it would be better > we could unify this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3848) Refactor Environment::mkdtemp into TemporaryDirectoryTest.
[ https://issues.apache.org/jira/browse/MESOS-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143015#comment-15143015 ] Joseph Wu commented on MESOS-3848: -- There are a couple tests that need multiple working directories. These are the tests that currently use {{environment->mkdtemp}}. > Refactor Environment::mkdtemp into TemporaryDirectoryTest. > -- > > Key: MESOS-3848 > URL: https://issues.apache.org/jira/browse/MESOS-3848 > Project: Mesos > Issue Type: Task > Components: test >Reporter: Joseph Wu >Assignee: Joseph Wu >Priority: Minor > Labels: mesosphere > > As part of [MESOS-3762], many tests were changed from one > {{TemporaryDirectoryTest}} to another {{TemporaryDirectoryTest}}. One subtle > difference is that the name of the temporary directory no longer contains the > name of the test. In [MESOS-3847], the duplicate {{TemporaryDirectoryTest}} > was removed. > The original {{TemporaryDirectoryTest}} called > [{{environment->mkdtemp}}|https://github.com/apache/mesos/blob/master/src/tests/environment.cpp#L494]. > We would like the naming, which is valuable for debugging, to be available > for a majority of tests. (A majority of tests inherit from > {{TemporaryDirectoryTest}} in some way.) > Note: > * Any additional directories created via {{environment->mkdtemp}} are cleaned > up after the test. > * We don't want mesos-specific logic in Stout, like the {{umount}} shell > command in {{Environment::TearDown}}. > *Proposed change:* > Move the temporary directory logic from {{Environment::mkdtemp}} to > {{TemporaryDirectoryTest}}. > *Tests that need to change* > | {{log_tests.cpp}} | {{LogZooKeeperTest}} | We can change {{ZooKeeperTest}} > to inherit from {{TemporaryDirectoryTest}} to get rid of code duplication | > | {{tests/mesos.cpp}} | {{MesosTest::CreateSlaveFlags}} | {{MesosTest}} > already inherits from {{TemporaryDirectoryTest}}. | > | {{tests/script.hpp}} | {{TEST_SCRIPT}} | This is used for the > {{ExampleTests}}. We can define a test class that inherits appropriately. | > | {{docker_tests.cpp}} | {{*}} | Already inherits from {{MesosTest}}. | -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2971) Implement OverlayFS based provisioner backend
[ https://issues.apache.org/jira/browse/MESOS-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-2971: -- Sprint: Mesosphere Sprint 29 > Implement OverlayFS based provisioner backend > - > > Key: MESOS-2971 > URL: https://issues.apache.org/jira/browse/MESOS-2971 > Project: Mesos > Issue Type: Improvement >Reporter: Timothy Chen >Assignee: Mei Wan > Labels: mesosphere, twitter, unified-containerizer-mvp > > Part of the image provisioning process is to call a backend to create a root > filesystem based on the image on disk layout. > The problem with the copy backend is that it's both waste of IO and space, > and bind only can deal with one layer. > Overlayfs backend allows us to utilize the filesystem to merge multiple > filesystems into one efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)