> On Nov. 1, 2017, 11 p.m., Gaston Kleiman wrote: > > src/tests/default_executor_tests.cpp > > Lines 538-548 (patched) > > <https://reviews.apache.org/r/62837/diff/2/?file=1873334#file1873334line538> > > > > This is racy; there's no guarantee that the first `TASK_RUNNING` status > > update will come after the second `TASK_STARTING` update. > > > > The following ordering would also be possible: [`TASK_STARTING`, > > `TASK_RUNNING`, `TASK_STARTING`, `TASK_RUNNING`, ...] > > Alexander Rukletsov wrote: > To simplify the logic here, let's introduce a matcher in our tests, > something like > ``` > MATCHER_P(TaskStateEq, state, "") { return arg.state() == state; } > ``` > which we then can use like > ``` > Future<TaskStatus> status; > EXPECT_CALL(sched, statusUpdate(&driver, TaskStateEq(TASK_RUNNING))) > .WillOnce(FutureArg<1>(&status)); > ``` > Then you can combine both task state and task id matchers to get an exact > status update. > > Gaston Kleiman wrote: > AlexR suggested something like this: > > ``` > Sequence task1; > EXPECT_CALL( > *scheduler, > update(_, AllOf(TaskStatusEq(TASK_STARTING), > TaskStateEq(taskInfo1)))) > .InSequence(task1) > .WillOnce( > DoAll( > FutureArg<1>(&startingUpdate1), > v1::scheduler::SendAcknowledge(frameworkId, agentId))); > > EXPECT_CALL( > *scheduler, > update(_, AllOf(TaskStatusEq(TASK_RUNNING), > TaskStateEq(taskInfo1)))) > .InSequence(task1) > .WillOnce( > DoAll( > FutureArg<1>(&runningUpdate1), > v1::scheduler::SendAcknowledge(frameworkId, agentId))); > > Sequence task2; > EXPECT_CALL( > *scheduler, > update(_, AllOf(TaskStatusEq(TASK_STARTING), > TaskStateEq(taskInfo2)))) > .InSequence(task2) > .WillOnce( > DoAll( > FutureArg<1>(&startingUpdate2), > v1::scheduler::SendAcknowledge(frameworkId, agentId))); > > EXPECT_CALL( > *scheduler, > update(_, AllOf(TaskStatusEq(TASK_RUNNING), > TaskStateEq(taskInfo2)))) > .InSequence(task2) > .WillOnce( > DoAll( > FutureArg<1>(&runningUpdate2), > v1::scheduler::SendAcknowledge(frameworkId, agentId))); > > AWAIT_READY(startingUpdate1); > AWAIT_READY(runningUpdate1); > > AWAIT_READY(startingUpdate2); > AWAIT_READY(runningUpdate2); > ``` > > Alexander Rukletsov wrote: > Parameters are mixed up : ) > `AllOf(TaskStatusEq(<taskInfo>), TaskStateEq(<TASK_STATE>))` > Also feel free to refactor `TaskStatusEq` to take an id if you think it > is simpler.
So I think we also have this race for the test `DefaultExecutorTest.KillTaskGroupOnTaskFailure`? See the code here: https://github.com/apache/mesos/blob/master/src/tests/default_executor_tests.cpp#L649:L670 And it seems the test `DefaultExecutorTest.KillTask` (https://github.com/apache/mesos/blob/master/src/tests/default_executor_tests.cpp#L301:L355) handles such race, so what about we follow that way? - Qian ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62837/#review189800 ----------------------------------------------------------- On Oct. 31, 2017, 12:53 p.m., Qian Zhang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/62837/ > ----------------------------------------------------------- > > (Updated Oct. 31, 2017, 12:53 p.m.) > > > Review request for mesos and Vinod Kone. > > > Bugs: MESOS-8051 > https://issues.apache.org/jira/browse/MESOS-8051 > > > Repository: mesos > > > Description > ------- > > Added a test `DefaultExecutorTest.KillMultipleTasks`. > > > Diffs > ----- > > src/tests/default_executor_tests.cpp > 5078bd4d70698f5cbd14c971fcecfd58f8467a04 > > > Diff: https://reviews.apache.org/r/62837/diff/2/ > > > Testing > ------- > > [ RUN ] MesosContainerizer/DefaultExecutorTest.KillMultipleTasks/0 > I1030 21:47:58.181413 22360 executor.cpp:192] Version: 1.5.0 > I1030 21:47:58.201525 22382 default_executor.cpp:191] Received SUBSCRIBED > event > I1030 21:47:58.203812 22382 default_executor.cpp:195] Subscribed executor on > core-dev > I1030 21:47:58.204406 22382 default_executor.cpp:191] Received LAUNCH_GROUP > event > I1030 21:47:58.205346 22390 default_executor.cpp:402] Setting > 'MESOS_CONTAINER_IP' to: 10.0.49.2 > I1030 21:47:58.220854 22356 default_executor.cpp:191] Received ACKNOWLEDGED > event > I1030 21:47:58.221959 22367 default_executor.cpp:191] Received ACKNOWLEDGED > event > I1030 21:47:58.261060 22370 default_executor.cpp:640] Successfully launched > tasks [ 4fee2fec-12b5-4af6-bd70-e67b4da26c00, > 24695aee-aa59-4295-8ad9-19f17ad9d52b ] in child containers [ > 94d8e76a-2215-41f9-8a27-c7ff12e96418.a14cf886-fccd-431e-be3b-432a59f85f18, > 94d8e76a-2215-41f9-8a27-c7ff12e96418.2fd5de24-fbed-490b-9914-de45902a8b47 ] > I1030 21:47:58.263293 22385 default_executor.cpp:713] Waiting for child > container > 94d8e76a-2215-41f9-8a27-c7ff12e96418.a14cf886-fccd-431e-be3b-432a59f85f18 of > task '4fee2fec-12b5-4af6-bd70-e67b4da26c00' > I1030 21:47:58.263546 22385 default_executor.cpp:713] Waiting for child > container > 94d8e76a-2215-41f9-8a27-c7ff12e96418.2fd5de24-fbed-490b-9914-de45902a8b47 of > task '24695aee-aa59-4295-8ad9-19f17ad9d52b' > I1030 21:47:58.307412 22351 default_executor.cpp:191] Received ACKNOWLEDGED > event > I1030 21:47:58.307924 22358 default_executor.cpp:191] Received ACKNOWLEDGED > event > I1030 21:47:58.354656 22365 default_executor.cpp:191] Received KILL event > I1030 21:47:58.354730 22365 default_executor.cpp:1172] Received kill for task > '4fee2fec-12b5-4af6-bd70-e67b4da26c00' > I1030 21:47:58.354787 22365 default_executor.cpp:1057] Killing task > 4fee2fec-12b5-4af6-bd70-e67b4da26c00 running in child container > 94d8e76a-2215-41f9-8a27-c7ff12e96418.a14cf886-fccd-431e-be3b-432a59f85f18 > with SIGTERM signal > I1030 21:47:58.354825 22365 default_executor.cpp:1079] Scheduling escalation > to SIGKILL in 3secs from now > I1030 21:47:58.355772 22365 default_executor.cpp:191] Received KILL event > I1030 21:47:58.355819 22365 default_executor.cpp:1172] Received kill for task > '24695aee-aa59-4295-8ad9-19f17ad9d52b' > I1030 21:47:58.355947 22365 default_executor.cpp:1057] Killing task > 24695aee-aa59-4295-8ad9-19f17ad9d52b running in child container > 94d8e76a-2215-41f9-8a27-c7ff12e96418.2fd5de24-fbed-490b-9914-de45902a8b47 > with SIGTERM signal > I1030 21:47:58.355980 22365 default_executor.cpp:1079] Scheduling escalation > to SIGKILL in 3secs from now > I1030 21:47:58.467401 22380 default_executor.cpp:888] Child container > 94d8e76a-2215-41f9-8a27-c7ff12e96418.a14cf886-fccd-431e-be3b-432a59f85f18 of > task '4fee2fec-12b5-4af6-bd70-e67b4da26c00' completed in state TASK_KILLED: > Command terminated with signal Terminated > I1030 21:47:58.467478 22380 default_executor.cpp:924] Killing task group > containing tasks [ 4fee2fec-12b5-4af6-bd70-e67b4da26c00, > 24695aee-aa59-4295-8ad9-19f17ad9d52b ] > I1030 21:47:58.467888 22380 default_executor.cpp:888] Child container > 94d8e76a-2215-41f9-8a27-c7ff12e96418.2fd5de24-fbed-490b-9914-de45902a8b47 of > task '24695aee-aa59-4295-8ad9-19f17ad9d52b' completed in state TASK_KILLED: > Command terminated with signal Terminated > I1030 21:47:58.467934 22380 default_executor.cpp:1017] Terminating after 1secs > [ OK ] MesosContainerizer/DefaultExecutorTest.KillMultipleTasks/0 (922 > ms) > > > [ RUN ] > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0 > I1030 21:51:07.645311 25065 executor.cpp:192] Version: 1.5.0 > I1030 21:51:07.665602 25086 default_executor.cpp:191] Received SUBSCRIBED > event > I1030 21:51:07.667804 25086 default_executor.cpp:195] Subscribed executor on > core-dev > I1030 21:51:07.668308 25086 default_executor.cpp:191] Received LAUNCH_GROUP > event > I1030 21:51:07.669232 25093 default_executor.cpp:402] Setting > 'MESOS_CONTAINER_IP' to: 10.0.49.2 > I1030 21:51:07.684146 25060 default_executor.cpp:191] Received ACKNOWLEDGED > event > I1030 21:51:07.685752 25071 default_executor.cpp:191] Received ACKNOWLEDGED > event > I1030 21:51:07.730559 25077 default_executor.cpp:640] Successfully launched > tasks [ fce3ed1c-d93a-4b6f-8950-771a989b279b, > 2e1b7db2-5f05-4474-bd5e-00c03468db71 ] in child containers [ > 9b58977b-8d1b-4a3d-a5af-624e44ba12f0.6ee13669-0de9-41b6-bde9-9f581b3e9c2e, > 9b58977b-8d1b-4a3d-a5af-624e44ba12f0.d11330e3-4f3a-4686-be5f-935ec8f44111 ] > I1030 21:51:07.732921 25089 default_executor.cpp:713] Waiting for child > container > 9b58977b-8d1b-4a3d-a5af-624e44ba12f0.6ee13669-0de9-41b6-bde9-9f581b3e9c2e of > task 'fce3ed1c-d93a-4b6f-8950-771a989b279b' > I1030 21:51:07.733170 25089 default_executor.cpp:713] Waiting for child > container > 9b58977b-8d1b-4a3d-a5af-624e44ba12f0.d11330e3-4f3a-4686-be5f-935ec8f44111 of > task '2e1b7db2-5f05-4474-bd5e-00c03468db71' > I1030 21:51:07.776757 25056 default_executor.cpp:191] Received ACKNOWLEDGED > event > I1030 21:51:07.777101 25062 default_executor.cpp:191] Received ACKNOWLEDGED > event > I1030 21:51:07.824290 25065 default_executor.cpp:191] Received KILL event > I1030 21:51:07.824533 25065 default_executor.cpp:1172] Received kill for task > 'fce3ed1c-d93a-4b6f-8950-771a989b279b' > I1030 21:51:07.824579 25065 default_executor.cpp:1057] Killing task > fce3ed1c-d93a-4b6f-8950-771a989b279b running in child container > 9b58977b-8d1b-4a3d-a5af-624e44ba12f0.6ee13669-0de9-41b6-bde9-9f581b3e9c2e > with SIGTERM signal > I1030 21:51:07.824606 25065 default_executor.cpp:1079] Scheduling escalation > to SIGKILL in 3secs from now > I1030 21:51:07.825533 25065 default_executor.cpp:191] Received KILL event > I1030 21:51:07.825597 25065 default_executor.cpp:1172] Received kill for task > '2e1b7db2-5f05-4474-bd5e-00c03468db71' > I1030 21:51:07.825647 25065 default_executor.cpp:1057] Killing task > 2e1b7db2-5f05-4474-bd5e-00c03468db71 running in child container > 9b58977b-8d1b-4a3d-a5af-624e44ba12f0.d11330e3-4f3a-4686-be5f-935ec8f44111 > with SIGTERM signal > I1030 21:51:07.825690 25065 default_executor.cpp:1079] Scheduling escalation > to SIGKILL in 3secs from now > I1030 21:51:07.957986 25083 default_executor.cpp:888] Child container > 9b58977b-8d1b-4a3d-a5af-624e44ba12f0.6ee13669-0de9-41b6-bde9-9f581b3e9c2e of > task 'fce3ed1c-d93a-4b6f-8950-771a989b279b' completed in state TASK_KILLED: > Command terminated with signal Terminated > I1030 21:51:07.958062 25083 default_executor.cpp:924] Killing task group > containing tasks [ fce3ed1c-d93a-4b6f-8950-771a989b279b, > 2e1b7db2-5f05-4474-bd5e-00c03468db71 ] > I1030 21:51:07.958566 25083 default_executor.cpp:888] Child container > 9b58977b-8d1b-4a3d-a5af-624e44ba12f0.d11330e3-4f3a-4686-be5f-935ec8f44111 of > task '2e1b7db2-5f05-4474-bd5e-00c03468db71' completed in state TASK_KILLED: > Command terminated with signal Terminated > I1030 21:51:07.958626 25083 default_executor.cpp:1017] Terminating after 1secs > [ OK ] > ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillMultipleTasks/0 > (9200 ms) > > > Thanks, > > Qian Zhang > >