[jira] [Commented] (MESOS-8873) StorageLocalResourceProviderTest.ROOT_ZeroSizedDisk is flaky.
[ https://issues.apache.org/jira/browse/MESOS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702673#comment-16702673 ] Chun-Hung Hsiao commented on MESOS-8873: Observed the same flaky on test {{StorageLocalResourceProviderTest.ROOT_RetryOperationStatusUpdateAfterRecovery}} on our internal 1.6.x CI: {noformat} ../../src/tests/storage_local_resource_provider_tests.cpp:2706: Failure Value of: updateSlave2->has_resource_providers() Actual: false Expected: true{noformat} The root cause seems to be the same as this one: {{NoopResourceEstimator}} returns zero oversubscribed resources and triggers the second {{UpdateSlaveMessage}} containing no resource provider. > StorageLocalResourceProviderTest.ROOT_ZeroSizedDisk is flaky. > - > > Key: MESOS-8873 > URL: https://issues.apache.org/jira/browse/MESOS-8873 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 1.6.0 >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao >Priority: Major > Labels: flaky-test, mesosphere, storage > Fix For: 1.7.0 > > Attachments: ZeroSizedDisk.txt > > > This test is flaky on CI: > {noformat} > ../../src/tests/storage_local_resource_provider_tests.cpp:406: Failure > Value of: updateSlave2->has_resource_providers() > Actual: false > Expected: true > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9422) DiskQuotaTest.DiskUsageExceedsQuota is flaky.
Chun-Hung Hsiao created MESOS-9422: -- Summary: DiskQuotaTest.DiskUsageExceedsQuota is flaky. Key: MESOS-9422 URL: https://issues.apache.org/jira/browse/MESOS-9422 Project: Mesos Issue Type: Bug Components: test Affects Versions: 1.6.1 Reporter: Chun-Hung Hsiao Observed a flake on this test in CI: {noformat} I1128 16:53:56.318218 104120320 executor.cpp:687] Forked command at 7880 I1128 16:53:56.318235 174346240 task_status_update_manager.cpp:383] Forwarding task status update TASK_STARTING (Status UUID: fd841117-e5f5-433e-a173-d8b3d5eda7b8) for task 5656ebcb-ed5e-4c0d-96f6-532d88e78c27 of framework f9320cbf-2553-4ce5-9cbd-e8deeea16b79- to the agent I1128 16:53:56.318398 175955968 slave.cpp:5778] Forwarding the update TASK_STARTING (Status UUID: fd841117-e5f5-433e-a173-d8b3d5eda7b8) for task 5656ebcb-ed5e-4c0d-96f6-532d88e78c27 of framework f9320cbf-2553-4ce5-9cbd-e8deeea16b79- to master@10.0.49.4:56289 I1128 16:53:56.318568 175955968 slave.cpp:5671] Task status update manager successfully handled status update TASK_STARTING (Status UUID: fd841117-e5f5-433e-a173-d8b3d5eda7b8) for task 5656ebcb-ed5e-4c0d-96f6-532d88e78c27 of framework f9320cbf-2553-4ce5-9cbd-e8deeea16b79- I1128 16:53:56.318614 175955968 slave.cpp:5687] Sending acknowledgement for status update TASK_STARTING (Status UUID: fd841117-e5f5-433e-a173-d8b3d5eda7b8) for task 5656ebcb-ed5e-4c0d-96f6-532d88e78c27 of framework f9320cbf-2553-4ce5-9cbd-e8deeea16b79- to executor(1)@10.0.49.4:56508 I1128 16:53:56.318817 173809664 master.cpp:8332] Status update TASK_STARTING (Status UUID: fd841117-e5f5-433e-a173-d8b3d5eda7b8) for task 5656ebcb-ed5e-4c0d-96f6-532d88e78c27 of framework f9320cbf-2553-4ce5-9cbd-e8deeea16b79- from agent f9320cbf-2553-4ce5-9cbd-e8deeea16b79-S0 at slave(37)@10.0.49.4:56289 (Jenkinss-Mac-mini.local) I1128 16:53:56.318872 173809664 master.cpp:8389] Forwarding status update TASK_STARTING (Status UUID: fd841117-e5f5-433e-a173-d8b3d5eda7b8) for task 5656ebcb-ed5e-4c0d-96f6-532d88e78c27 of framework f9320cbf-2553-4ce5-9cbd-e8deeea16b79- I1128 16:53:56.318972 173809664 master.cpp:10842] Updating the state of task 5656ebcb-ed5e-4c0d-96f6-532d88e78c27 of framework f9320cbf-2553-4ce5-9cbd-e8deeea16b79- (latest state: TASK_STARTING, status update state: TASK_STARTING) I1128 16:53:56.319315 174882816 sched.cpp:1022] Scheduler::statusUpdate took 203974ns I1128 16:53:56.319394 177029120 slave.cpp:5286] Handling status update TASK_RUNNING (Status UUID: 759fb8db-a319-4409-a0c8-238484295637) for task 5656ebcb-ed5e-4c0d-96f6-532d88e78c27 of framework f9320cbf-2553-4ce5-9cbd-e8deeea16b79- from executor(1)@10.0.49.4:56508 I1128 16:53:56.319638 173273088 master.cpp:6188] Processing ACKNOWLEDGE call for status fd841117-e5f5-433e-a173-d8b3d5eda7b8 for task 5656ebcb-ed5e-4c0d-96f6-532d88e78c27 of framework f9320cbf-2553-4ce5-9cbd-e8deeea16b79- (default) at scheduler-f8c2522b-fbc4--add8-a781092521a0@10.0.49.4:56289 on agent f9320cbf-2553-4ce5-9cbd-e8deeea16b79-S0 I1128 16:53:56.320201 175955968 task_status_update_manager.cpp:401] Received task status update acknowledgement (UUID: fd841117-e5f5-433e-a173-d8b3d5eda7b8) for task 5656ebcb-ed5e-4c0d-96f6-532d88e78c27 of framework f9320cbf-2553-4ce5-9cbd-e8deeea16b79- I1128 16:53:56.320471 173273088 slave.cpp:4522] Task status update manager successfully handled status update acknowledgement (UUID: fd841117-e5f5-433e-a173-d8b3d5eda7b8) for task 5656ebcb-ed5e-4c0d-96f6-532d88e78c27 of framework f9320cbf-2553-4ce5-9cbd-e8deeea16b79- I1128 16:53:56.320839 174882816 task_status_update_manager.cpp:328] Received task status update TASK_RUNNING (Status UUID: 759fb8db-a319-4409-a0c8-238484295637) for task 5656ebcb-ed5e-4c0d-96f6-532d88e78c27 of framework f9320cbf-2553-4ce5-9cbd-e8deeea16b79- I1128 16:53:56.320909 174882816 task_status_update_manager.cpp:383] Forwarding task status update TASK_RUNNING (Status UUID: 759fb8db-a319-4409-a0c8-238484295637) for task 5656ebcb-ed5e-4c0d-96f6-532d88e78c27 of framework f9320cbf-2553-4ce5-9cbd-e8deeea16b79- to the agent I1128 16:53:56.320989 173273088 slave.cpp:5778] Forwarding the update TASK_RUNNING (Status UUID: 759fb8db-a319-4409-a0c8-238484295637) for task 5656ebcb-ed5e-4c0d-96f6-532d88e78c27 of framework f9320cbf-2553-4ce5-9cbd-e8deeea16b79- to master@10.0.49.4:56289 I1128 16:53:56.321148 173273088 slave.cpp:5671] Task status update manager successfully handled status update TASK_RUNNING (Status UUID: 759fb8db-a319-4409-a0c8-238484295637) for task 5656ebcb-ed5e-4c0d-96f6-532d88e78c27 of framework f9320cbf-2553-4ce5-9cbd-e8deeea16b79- I1128 16:53:56.321182 173273088 slave.cpp:5687] Sending acknowledgement for status update TASK_RUNNING (Status UUID: 759fb8db-a319-4409-a0c8-238484295637) for task
[jira] [Commented] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.
[ https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702567#comment-16702567 ] Till Toenshoff commented on MESOS-4646: --- [~wangcong] we would love to get this solved as otherwise our CI keeps coming back red. Would be terrific if you could try the given setup on this test. > PortMappingIsolatorTests get kernel stuck. > -- > > Key: MESOS-4646 > URL: https://issues.apache.org/jira/browse/MESOS-4646 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.8.0 > Environment: Linux Kernel 3.19.9-49-generic, > libnl-3.2.27 >Reporter: Till Toenshoff >Priority: Major > Labels: flaky, flaky-test > > {noformat} > $ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*" > Source directory: /home/till/scratchpad/mesos > Build directory: /home/till/scratchpad/mesos/build > - > We cannot run any cgroups tests that require mounting > hierarchies because you have the following hierarchies mounted: > /sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, > /sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, > /sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, > /sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd > We'll disable the CgroupsNoHierarchyTest test fixture for now. > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > No 'perf' command found so no 'perf' tests will be run > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > The 'perf' command wasn't found so tests using it > to sample the 'cycles' hardware event will not be run. > - > /bin/nc > /usr/local/bin/curl > Note: Google Test filter = >
[jira] [Commented] (MESOS-9419) Executor to framework message crashes master if framework has not re-registered.
[ https://issues.apache.org/jira/browse/MESOS-9419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702527#comment-16702527 ] Chun-Hung Hsiao commented on MESOS-9419: Backported to 1.7.x, 1.6.x, 1.5.x, and the unofficially-maintained 1.4.x as well. 1.7.x: {noformat} commit bd74257ff8ab8d7bd305aa694c3cd7cbd6840af0 Author: Chun-Hung Hsiao Date: Mon Nov 26 20:12:36 2018 -0800 Fixed master crash when executors send messages to recovered frameworks. The `Framework::send` function assumes that either `http` or `pid` is set, which is not true for a framework that hasn't yet reregistered yet but recovered from a reregistered agent. As a result, the master would crash when a recovered executor tries to send a message to such a framework (see MESOS-9419). This patch fixes this crash bug. Review: https://reviews.apache.org/r/69451{noformat} 1.6.x: {noformat} commit 2d7cb6b60d6cdd3c1dbe1470f0afa044ae78c10c Author: Chun-Hung Hsiao Date: Mon Nov 26 20:12:36 2018 -0800 Fixed master crash when executors send messages to recovered frameworks. The `Framework::send` function assumes that either `http` or `pid` is set, which is not true for a framework that hasn't yet reregistered yet but recovered from a reregistered agent. As a result, the master would crash when a recovered executor tries to send a message to such a framework (see MESOS-9419). This patch fixes this crash bug. Review: https://reviews.apache.org/r/69451{noformat} 1.5.x: {noformat} commit d27d057b7769eafa3e967763a073a2841520e050 Author: Chun-Hung Hsiao Date: Mon Nov 26 20:12:36 2018 -0800 Fixed master crash when executors send messages to recovered frameworks. The `Framework::send` function assumes that either `http` or `pid` is set, which is not true for a framework that hasn't yet reregistered yet but recovered from a reregistered agent. As a result, the master would crash when a recovered executor tries to send a message to such a framework (see MESOS-9419). This patch fixes this crash bug. Review: https://reviews.apache.org/r/69451{noformat} > Executor to framework message crashes master if framework has not > re-registered. > > > Key: MESOS-9419 > URL: https://issues.apache.org/jira/browse/MESOS-9419 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.3.0, 1.3.1, 1.3.2, 1.4.0, > 1.4.1, 1.4.2, 1.5.0, 1.5.1, 1.6.0, 1.6.1, 1.7.0 >Reporter: Benjamin Mahler >Assignee: Chun-Hung Hsiao >Priority: Blocker > Fix For: 1.5.2, 1.6.2, 1.7.1, 1.8.0 > > > If the executor sends a framework message after a master failover, and the > framework has not yet re-registered with the master, this will crash the > master: > {code} > W20181105 22:02:48.782819 172709 master.hpp:2304] Master attempted to send > message to disconnected framework 03dc2603-acd6-491e-\ 8717-3f03e5ee37f4- > (Cook-1.24.0-9299b474217db499c9d28738050b359ac8dd55bb) > F20181105 22:02:48.782830 172709 master.hpp:2314] CHECK_SOME(pid): is NONE > *** Check failure stack trace: *** > *** @ 0x7f09e016b6cd google::LogMessage::Fail() > *** @ 0x7f09e016d38d google::LogMessage::SendToLog() > *** @ 0x7f09e016b2b3 google::LogMessage::Flush() > *** @ 0x7f09e016de09 google::LogMessageFatal::~LogMessageFatal() > *** @ 0x7f09df086228 _CheckFatal::~_CheckFatal() > *** @ 0x7f09df3a403d mesos::internal::master::Framework::send<>() > *** @ 0x7f09df2f4886 mesos::internal::master::Master::executorMessage() > *** @ 0x7f09df3b06a4 > _ZN15ProtobufProcessIN5mesos8internal6master6MasterEE8handlerNINS1_26ExecutorToFrameworkMessageEJRKNS0\ > > _7SlaveIDERKNS0_11FrameworkIDERKNS0_10ExecutorIDERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcJS9_SC_SF_SN_EEEvPS3_MS3\ > _FvRKN7process4UPIDEDpT1_ESS_SN_DpMT_KFT0_vE @ 0x7f09df345b43 > std::_Function_handler<>::_M_invoke() > *** @ 0x7f09df36930f ProtobufProcess<>::consume() > *** @ 0x7f09df2e0ff5 mesos::internal::master::Master::_consume() > *** @ 0x7f09df2f5542 mesos::internal::master::Master::consume() > *** @ 0x7f09e00d9c7a process::ProcessManager::resume() > *** @ 0x7f09e00dd836 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv > *** @ 0x7f09dd467ac8 execute_native_thread_routine > *** @ 0x7f09dd6f6b50 start_thread > *** @ 0x7f09dcc7030d (unknown) > {code} > This is because Framework::send proceeds if the framework is disconnected. In > the case of a recovered framework, it will not have a pid or http connection > yet: > https://github.com/apache/mesos/blob/9b889a10927b13510a1d02e7328925dba3438a0b/src/master/master.hpp#L2590-L2610 > {code} > // Sends a message to the connected framework. > template > void Framework::send(const Message& message) > { > if (!connected()) { >
[jira] [Assigned] (MESOS-9247) MasterAPITest.EventAuthorizationFiltering is flaky
[ https://issues.apache.org/jira/browse/MESOS-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone reassigned MESOS-9247: - Assignee: Till Toenshoff > MasterAPITest.EventAuthorizationFiltering is flaky > -- > > Key: MESOS-9247 > URL: https://issues.apache.org/jira/browse/MESOS-9247 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 1.7.0 >Reporter: Greg Mann >Assignee: Till Toenshoff >Priority: Major > Labels: flaky, flaky-test, integration, mesosphere > Attachments: MasterAPITest.EventAuthorizationFiltering.txt > > > Saw this failure on a CentOS 6 SSL build in our internal CI. Build log > attached. For some reason, it seems that the initial {{TASK_ADDED}} event is > missed: > {code} > ../../src/tests/api_tests.cpp:2922 > Expected: v1::master::Event::TASK_ADDED > Which is: TASK_ADDED > To be equal to: event->get().type() > Which is: TASK_UPDATED > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9421) ZooKeeperTest.Auth is flaky.
Till Toenshoff created MESOS-9421: - Summary: ZooKeeperTest.Auth is flaky. Key: MESOS-9421 URL: https://issues.apache.org/jira/browse/MESOS-9421 Project: Mesos Issue Type: Bug Components: test Affects Versions: 1.8.0 Environment: macOS Reporter: Till Toenshoff {noformat} 05:37:27 [ RUN ] ZooKeeperTest.Auth 05:37:27 I1127 21:37:44.361042 2668864320 zookeeper_test_server.cpp:156] Started ZooKeeperTestServer on port 53751 05:37:27 2018-11-27 21:37:44,361:80931(0x77bc3000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8 05:37:27 2018-11-27 21:37:44,361:80931(0x77bc3000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local 05:37:27 2018-11-27 21:37:44,361:80931(0x77bc3000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin 05:37:27 2018-11-27 21:37:44,361:80931(0x77bc3000):ZOO_INFO@log_env@765: Client environment:os.arch=17.4.0 05:37:27 2018-11-27 21:37:44,361:80931(0x77bc3000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64 05:37:27 2018-11-27 21:37:44,361:80931(0x77bc3000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins 05:37:27 2018-11-27 21:37:44,361:80931(0x77bc3000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins 05:37:27 2018-11-27 21:37:44,361:80931(0x77bc3000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build 05:37:27 2018-11-27 21:37:44,361:80931(0x77bc3000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:53751 sessionTimeout=1 watcher=0x10dbf7dc0 sessionId=0 sessionPasswd= context=0x7f9c4b2d9600 flags=0 05:37:27 2018-11-27 21:37:44,361:80931(0x78567000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:53751] 05:37:27 2018-11-27 21:37:44,365:80931(0x78567000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:53751], sessionId=0x16758d2b9a8, negotiated timeout=1 05:37:31 2018-11-27 21:37:47,699:80931(0x78567000):ZOO_INFO@auth_completion_func@1327: Authentication scheme digest succeeded 05:37:31 2018-11-27 21:37:47,702:80931(0x77cc9000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8 05:37:31 2018-11-27 21:37:47,702:80931(0x77cc9000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local 05:37:31 2018-11-27 21:37:47,702:80931(0x77cc9000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin 05:37:31 2018-11-27 21:37:47,702:80931(0x77cc9000):ZOO_INFO@log_env@765: Client environment:os.arch=17.4.0 05:37:31 2018-11-27 21:37:47,702:80931(0x77cc9000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64 05:37:31 2018-11-27 21:37:47,702:80931(0x77cc9000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins 05:37:31 2018-11-27 21:37:47,702:80931(0x77cc9000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins 05:37:31 2018-11-27 21:37:47,702:80931(0x77cc9000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build 05:37:31 2018-11-27 21:37:47,702:80931(0x77cc9000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:53751 sessionTimeout=1 watcher=0x10dbf7dc0 sessionId=0 sessionPasswd= context=0x7f9c487174f0 flags=0 05:37:31 2018-11-27 21:37:47,702:80931(0x78d82000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:53751] 05:37:31 2018-11-27 21:37:47,703:80931(0x78d82000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:53751], sessionId=0x16758d2b9a80001, negotiated timeout=1 05:37:31 2018-11-27 21:37:47,704:80931(0x77b4):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8 05:37:31 2018-11-27 21:37:47,705:80931(0x77b4):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local 05:37:31 2018-11-27 21:37:47,705:80931(0x77b4):ZOO_INFO@log_env@764: Client environment:os.name=Darwin 05:37:31 2018-11-27 21:37:47,705:80931(0x77b4):ZOO_INFO@log_env@765: Client environment:os.arch=17.4.0 05:37:31 2018-11-27 21:37:47,705:80931(0x77b4):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64 05:37:31 2018-11-27 21:37:47,705:80931(0x77b4):ZOO_INFO@log_env@774: Client environment:user.name=jenkins 05:37:31 2018-11-27