[ 
https://issues.apache.org/jira/browse/MESOS-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975241#comment-14975241
 ] 

Vinod Kone commented on MESOS-3747:
-----------------------------------

I would suggest to tackle 2) first. You could add a new "REASON_USER_UNKNOWN" 
reason and send that with a TASK_ERROR (TASK_LOST might not be appropriate here 
because a reschedule will likely result in an error again) in Slave::_runTask().

> HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string
> -------------------------------------------------------------------------
>
>                 Key: MESOS-3747
>                 URL: https://issues.apache.org/jira/browse/MESOS-3747
>             Project: Mesos
>          Issue Type: Bug
>          Components: HTTP API
>    Affects Versions: 0.24.0, 0.24.1, 0.25.0
>            Reporter: Ben Whitehead
>            Assignee: Liqiang Lin
>            Priority: Blocker
>
> When using libmesos a framework can set its user to {{""}} (empty string) to 
> inherit the user the agent processes is running as, this behavior now results 
> in a {{TASK_FAILED}}.
> Full messages and relevant agent logs below.
> The error returned to the framework tells me nothing about the user not 
> existing on the agent host instead it tells me the container died due to OOM.
> {code:title=FrameworkInfo}
> call {
>     type: SUBSCRIBE
>     subscribe: {
>         frameworkInfo: {
>             user: "",
>             name: "testing"
>         }
>     }
> }
> {code}
> {code:title=TaskInfo}
> call {
>     framework_id { value: "20151015-125949-16777343-5050-20146-0000" },
>     type: ACCEPT,
>     accept { 
>         offer_ids: [{ value: "20151015-125949-16777343-5050-20146-O0" }],
>         operations { 
>             type: LAUNCH, 
>             launch { 
>                 task_infos [
>                     {
>                         name: "task-1",
>                         task_id: { value: "task-1" },
>                         agent_id: { value: 
> "20151015-125949-16777343-5050-20146-S0" },
>                         resources [
>                             { name: "cpus", type: SCALAR, scalar: { value: 
> 0.1 },  role: "*" },
>                             { name: "mem",  type: SCALAR, scalar: { value: 
> 64.0 }, role: "*" },
>                             { name: "disk", type: SCALAR, scalar: { value: 
> 0.0 },  role: "*" },
>                         ],
>                         command: { 
>                             environment { 
>                                 variables [ 
>                                     { name: "SLEEP_SECONDS" value: "15" } 
>                                 ] 
>                             },
>                             value: "env | sort && sleep $SLEEP_SECONDS"
>                         }
>                     }
>                 ]
>              }
>          }
>      }
> }
> {code}
> {code:title=Update Status}
> event: {
>     type: UPDATE,
>     update: { 
>         status: { 
>             task_id: { value: "task-1" }, 
>             state: TASK_FAILED,
>             message: "Container destroyed while preparing isolators",
>             agent_id: { value: "20151015-125949-16777343-5050-20146-S0" }, 
>             timestamp: 1.444939217401241E9,
>             executor_id: { value: "task-1" },
>             source: SOURCE_AGENT, 
>             reason: REASON_MEMORY_LIMIT,
>             uuid: "\237g()L\026EQ\222\301\261\265\\\221\224|" 
>         } 
>     }
> }
> {code}
> {code:title=agent logs}
> I1015 13:15:34.260592 19639 slave.cpp:1270] Got assigned task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000
> I1015 13:15:34.260921 19639 slave.cpp:1386] Launching task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000
> W1015 13:15:34.262243 19639 paths.cpp:423] Failed to chown executor directory 
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b':
>  Failed to get user information for '': Success
> I1015 13:15:34.262444 19639 slave.cpp:4852] Launching executor task-1 of 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 with resources 
> cpus(*):0.1; mem(*):32 in work directory 
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b'
> I1015 13:15:34.262581 19639 slave.cpp:1604] Queuing task 'task-1' for 
> executor task-1 of framework 'e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000
> I1015 13:15:34.262684 19638 docker.cpp:734] No container info found, skipping 
> launch
> I1015 13:15:34.263478 19638 containerizer.cpp:640] Starting container 
> '3958ff84-8dd9-4c3c-995d-5aba5250541b' for executor 'task-1' of framework 
> 'e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000'
> E1015 13:15:34.264516 19641 slave.cpp:3342] Container 
> '3958ff84-8dd9-4c3c-995d-5aba5250541b' for executor 'task-1' of framework 
> 'e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000' failed to start: Failed to 
> prepare isolator: Failed to get user information for '': Success
> I1015 13:15:34.264681 19636 containerizer.cpp:1097] Destroying container 
> '3958ff84-8dd9-4c3c-995d-5aba5250541b'
> I1015 13:15:34.265997 19636 slave.cpp:3433] Executor 'task-1' of framework 
> e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 has terminated with unknown status
> I1015 13:15:34.266568 19636 slave.cpp:2717] Handling status update 
> TASK_FAILED (UUID: 6e45302e-72a4-442f-8056-6154eab5e265) for task task-1 of 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 from @0.0.0.0:0
> W1015 13:15:34.266695 19636 containerizer.cpp:988] Ignoring update for 
> unknown container: 3958ff84-8dd9-4c3c-995d-5aba5250541b
> I1015 13:15:34.266772 19638 status_update_manager.cpp:322] Received status 
> update TASK_FAILED (UUID: 6e45302e-72a4-442f-8056-6154eab5e265) for task 
> task-1 of framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000
> I1015 13:15:34.266885 19636 slave.cpp:3016] Forwarding the update TASK_FAILED 
> (UUID: 6e45302e-72a4-442f-8056-6154eab5e265) for task task-1 of framework 
> e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000 to master@127.0.0.1:5050
> I1015 13:15:35.255997 19638 status_update_manager.cpp:394] Received status 
> update acknowledgement (UUID: 6e45302e-72a4-442f-8056-6154eab5e265) for task 
> task-1 of framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000
> I1015 13:15:35.256165 19640 slave.cpp:3544] Cleaning up executor 'task-1' of 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000
> I1015 13:15:35.256273 19641 gc.cpp:56] Scheduling 
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b'
>  for gc 6.99999703411852days in the future
> I1015 13:15:35.256283 19640 slave.cpp:3633] Cleaning up framework 
> e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000
> I1015 13:15:35.256340 19641 gc.cpp:56] Scheduling 
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000/executors/task-1'
>  for gc 6.99999703386667days in the future
> I1015 13:15:35.256350 19634 status_update_manager.cpp:284] Closing status 
> update streams for framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000
> I1015 13:15:35.256377 19641 gc.cpp:56] Scheduling 
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-0000'
>  for gc 6.99999703291556days in the future
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to