[ https://issues.apache.org/jira/browse/MESOS-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975241#comment-14975241 ]
Vinod Kone commented on MESOS-3747: ----------------------------------- I would suggest to tackle 2) first. You could add a new "REASON_USER_UNKNOWN" reason and send that with a TASK_ERROR (TASK_LOST might not be appropriate here because a reschedule will likely result in an error again) in Slave::_runTask(). > HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string > ------------------------------------------------------------------------- > > Key: MESOS-3747 > URL: https://issues.apache.org/jira/browse/MESOS-3747 > Project: Mesos > Issue Type: Bug > Components: HTTP API > Affects Versions: 0.24.0, 0.24.1, 0.25.0 > Reporter: Ben Whitehead > Assignee: Liqiang Lin > Priority: Blocker > > When using libmesos a framework can set its user to {{""}} (empty string) to > inherit the user the agent processes is running as, this behavior now results > in a {{TASK_FAILED}}. > Full messages and relevant agent logs below. > The error returned to the framework tells me nothing about the user not > existing on the agent host instead it tells me the container died due to OOM. > {code:title=FrameworkInfo} > call { > type: SUBSCRIBE > subscribe: { > frameworkInfo: { > user: "", > name: "testing" > } > } > } > {code} > {code:title=TaskInfo} > call { > framework_id { value: "20151015-125949-16777343-5050-20146-0000" }, > type: ACCEPT, > accept { > offer_ids: [{ value: "20151015-125949-16777343-5050-20146-O0" }], > operations { > type: LAUNCH, > launch { > task_infos [ > { > name: "task-1", > task_id: { value: "task-1" }, > agent_id: { value: > "20151015-125949-16777343-5050-20146-S0" }, > resources [ > { name: "cpus", type: SCALAR, scalar: { value: > 0.1 }, role: "*" }, > { name: "mem", type: SCALAR, scalar: { value: > 64.0 }, role: "*" }, > { name: "disk", type: SCALAR, scalar: { value: > 0.0 }, role: "*" }, > ], > command: { > environment { > variables [ > { name: "SLEEP_SECONDS" value: "15" } > ] > }, > value: "env | sort && sleep $SLEEP_SECONDS" > } > } > ] > } > } > } > } > {code} > {code:title=Update Status} > event: { > type: UPDATE, > update: { > status: { > task_id: { value: "task-1" }, > state: TASK_FAILED, > message: "Container destroyed while preparing isolators", > agent_id: { value: "20151015-125949-16777343-5050-20146-S0" }, > timestamp: 1.444939217401241E9, > executor_id: { value: "task-1" }, > source: SOURCE_AGENT, > reason: REASON_MEMORY_LIMIT, > uuid: "\237g()L\026EQ\222\301\261\265\\\221\224|" > } > } > } > {code} > {code:title=agent logs} > I1015 13:15:34.260592 19639 slave.cpp:1270] Got assigned task task-1 for > framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 > I1015 13:15:34.260921 19639 slave.cpp:1386] Launching task task-1 for > framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 > W1015 13:15:34.262243 19639 paths.cpp:423] Failed to chown executor directory > '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b': > Failed to get user information for '': Success > I1015 13:15:34.262444 19639 slave.cpp:4852] Launching executor task-1 of > framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 with resources > cpus(*):0.1; mem(*):32 in work directory > '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b' > I1015 13:15:34.262581 19639 slave.cpp:1604] Queuing task 'task-1' for > executor task-1 of framework 'e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 > I1015 13:15:34.262684 19638 docker.cpp:734] No container info found, skipping > launch > I1015 13:15:34.263478 19638 containerizer.cpp:640] Starting container > '3958ff84-8dd9-4c3c-995d-5aba5250541b' for executor 'task-1' of framework > 'e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000' > E1015 13:15:34.264516 19641 slave.cpp:3342] Container > '3958ff84-8dd9-4c3c-995d-5aba5250541b' for executor 'task-1' of framework > 'e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000' failed to start: Failed to > prepare isolator: Failed to get user information for '': Success > I1015 13:15:34.264681 19636 containerizer.cpp:1097] Destroying container > '3958ff84-8dd9-4c3c-995d-5aba5250541b' > I1015 13:15:34.265997 19636 slave.cpp:3433] Executor 'task-1' of framework > e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 has terminated with unknown status > I1015 13:15:34.266568 19636 slave.cpp:2717] Handling status update > TASK_FAILED (UUID: 6e45302e-72a4-442f-8056-6154eab5e265) for task task-1 of > framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 from @0.0.0.0:0 > W1015 13:15:34.266695 19636 containerizer.cpp:988] Ignoring update for > unknown container: 3958ff84-8dd9-4c3c-995d-5aba5250541b > I1015 13:15:34.266772 19638 status_update_manager.cpp:322] Received status > update TASK_FAILED (UUID: 6e45302e-72a4-442f-8056-6154eab5e265) for task > task-1 of framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 > I1015 13:15:34.266885 19636 slave.cpp:3016] Forwarding the update TASK_FAILED > (UUID: 6e45302e-72a4-442f-8056-6154eab5e265) for task task-1 of framework > e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 to master@127.0.0.1:5050 > I1015 13:15:35.255997 19638 status_update_manager.cpp:394] Received status > update acknowledgement (UUID: 6e45302e-72a4-442f-8056-6154eab5e265) for task > task-1 of framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 > I1015 13:15:35.256165 19640 slave.cpp:3544] Cleaning up executor 'task-1' of > framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 > I1015 13:15:35.256273 19641 gc.cpp:56] Scheduling > '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b' > for gc 6.99999703411852days in the future > I1015 13:15:35.256283 19640 slave.cpp:3633] Cleaning up framework > e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 > I1015 13:15:35.256340 19641 gc.cpp:56] Scheduling > '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000/executors/task-1' > for gc 6.99999703386667days in the future > I1015 13:15:35.256350 19634 status_update_manager.cpp:284] Closing status > update streams for framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 > I1015 13:15:35.256377 19641 gc.cpp:56] Scheduling > '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000' > for gc 6.99999703291556days in the future > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)