Karsten created MESOS-8585:
------------------------------

             Summary: Agent Crashes When Ask to Start Task with Unknown User
                 Key: MESOS-8585
                 URL: https://issues.apache.org/jira/browse/MESOS-8585
             Project: Mesos
          Issue Type: Bug
          Components: agent
    Affects Versions: 1.5.0
            Reporter: Karsten


The Marathon team has an integration test that tries to start a task with an 
unknown user. The test expects a \{{TASK_FAILED}}. However, we see 
\{{TASK_DROPPED}} instead. The agent logs seem to suggest that the agent 
crashes and restarts.

 

{code}
 783 2018-02-14 14:55:45: I0214 14:55:45.319974  6213 slave.cpp:2542] Launching 
task 'sleep-bad-user-7.228ba17d-1197-11e8-baca-6a2835f12cb6' for framework 
120721e5-96e5-4c0b-8660-d5ba2e96f05a-0001
    784 2018-02-14 14:55:45: I0214 14:55:45.320605  6213 paths.cpp:727] 
Creating sandbox 
'/var/lib/mesos/slave/slaves/120721e5-96e5-4c0b-8660-d5ba2e96f05a-S3/frameworks/120721e5-96e5-4c0b-8660-d5ba2e96f05
    784 
a-0001/executors/sleep-bad-user-7.228ba17d-1197-11e8-baca-6a2835f12cb6/runs/dc99056a-1d85-427f-a34b-ac666d4acc88'
 for user 'bad'
    785 2018-02-14 14:55:45: F0214 14:55:45.321131  6213 paths.cpp:735] 
CHECK_SOME(mkdir): Failed to chown directory to 'bad': No such user 'bad' 
Failed to create executor directory '/var/lib/mesos/slave/
    785 
slaves/120721e5-96e5-4c0b-8660-d5ba2e96f05a-S3/frameworks/120721e5-96e5-4c0b-8660-d5ba2e96f05a-0001/executors/sleep-bad-user-7.228ba17d-1197-11e8-baca-6a2835f12cb6/runs/dc99056a-1d85-427f-a34b-ac6
    785 66d4acc88'
    786 2018-02-14 14:55:45: *** Check failure stack trace: ***
    787 2018-02-14 14:55:45:     @     0x7f72033444ad  
google::LogMessage::Fail()
    788 2018-02-14 14:55:45:     @     0x7f72033462dd  
google::LogMessage::SendToLog()
    789 2018-02-14 14:55:45:     @     0x7f720334409c  
google::LogMessage::Flush()
    790 2018-02-14 14:55:45:     @     0x7f7203346bd9  
google::LogMessageFatal::~LogMessageFatal()
    791 2018-02-14 14:55:45:     @     0x56544ca378f9  
_CheckFatal::~_CheckFatal()
    792 2018-02-14 14:55:45:     @     0x7f720270f30d  
mesos::internal::slave::paths::createExecutorDirectory()
    793 2018-02-14 14:55:45:     @     0x7f720273812c  
mesos::internal::slave::Framework::addExecutor()
    794 2018-02-14 14:55:45:     @     0x7f7202753e35  
mesos::internal::slave::Slave::__run()
    795 2018-02-14 14:55:45:     @     0x7f7202764292  
_ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal5slave5SlaveERKNS1_6FutureISt4
    795 
listIbSaIbEEEERKNSA_13FrameworkInfoERKNSA_12ExecutorInfoERK6OptionINSA_8TaskInfoEERKSR_INSA_13TaskGroupInfoEERKSt6vectorINSB_19ResourceVersionUUIDESaIS11_EESK_SN_SQ_SV_SZ_S15_EEvRKNS1_3PIDIT_EEMS1
    795 
7_FvT0_T1_T2_T3_T4_T5_EOT6_OT7_OT8_OT9_OT10_OT11_EUlOSI_OSL_OSO_OST_OSX_OS13_S3_E_ISI_SL_SO_ST_SX_S13_St12_PlaceholderILi1EEEEEEclEOS3_
    796 2018-02-14 14:55:45:     @     0x7f72032a2b11  
process::ProcessBase::consume()
    797 2018-02-14 14:55:45:     @     0x7f72032b183c  
process::ProcessManager::resume()
    798 2018-02-14 14:55:45:     @     0x7f72032b6da6  
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
    799 2018-02-14 14:55:45:     @     0x7f72005ced73  (unknown)
    800 2018-02-14 14:55:45:     @     0x7f72000cf52c  (unknown)
    801 2018-02-14 14:55:45:     @     0x7f71ffe0d1dd  (unknown)
    802 2018-02-14 14:57:15: dcos-mesos-slave.service: Main process exited, 
code=killed, status=6/ABRT
    803 2018-02-14 14:57:15: dcos-mesos-slave.service: Unit entered failed 
state.
    804 2018-02-14 14:57:15: dcos-mesos-slave.service: Failed with result 
'signal'.
    805 2018-02-14 14:57:20: dcos-mesos-slave.service: Service hold-off time 
over, scheduling restart.
    806 2018-02-14 14:57:20: Stopped Mesos Agent: distributed systems kernel 
agent.
    807 2018-02-14 14:57:20: Starting Mesos Agent: distributed systems kernel 
agent...

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to