[ https://issues.apache.org/jira/browse/MESOS-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094278#comment-14094278 ]
Julien Eid commented on MESOS-1565: ----------------------------------- [~tnachen] I've run into this issue as well and it doesn't have a Docker label, if you're interested. > Improve error message for external containerizer when containerizer_path > results in command not found (status: 127) > ------------------------------------------------------------------------------------------------------------------- > > Key: MESOS-1565 > URL: https://issues.apache.org/jira/browse/MESOS-1565 > Project: Mesos > Issue Type: Improvement > Components: containerization > Affects Versions: 0.19.0 > Reporter: Ben Whitehead > > When attempting to run mesos-slave with an external containerizer with a bad > containerizer_path the error message is misleading as to what the real > problem is. > It would be nice if the containerizer code could detect exit code 127 and > have an error message to the effect of "Command not found: > <containerizer_path>" > Below is a log file illustrating the scenario I ran into. > {noformat} > mesos-slave.sh --log_dir=/tmp/mesos/slave/log_dir > --master=zk://localhost:2181/mesos --work_dir=/tmp/mesos/slave/work_dir > --containerizer_path=/usr/local/bin/deimos --isolation=external > I0707 17:09:00.525806 29499 logging.cpp:167] INFO level logging started! > I0707 17:09:00.525997 29499 main.cpp:126] Build: 2014-06-12 18:09:59 by > ben.whitehead > I0707 17:09:00.526013 29499 main.cpp:128] Version: 0.19.0 > I0707 17:09:00.526023 29499 main.cpp:131] Git tag: 0.19.0 > I0707 17:09:00.526033 29499 main.cpp:135] Git SHA: > 51e047524cf744ee257870eb479345646c0428ff > I0707 17:09:00.526167 29499 main.cpp:149] Starting Mesos slave > 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@712: Client > environment:zookeeper.version=zookeeper C client 3.4.5 > 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@716: Client > environment:host.name=xxxxxx > 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@723: Client > environment:os.name=Linux > 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@724: Client > environment:os.arch=3.11.10-17-desktop > 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@725: Client > environment:os.version=#1 SMP PREEMPT Mon Jun 16 15:28:13 UTC 2014 (fba7c1f) > 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@733: Client > environment:user.name=ben.whitehead > 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@741: Client > environment:user.home=/home/ben.whitehead > 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@753: Client > environment:user.dir=/home/ben.whitehead/tmp/mesos/mesos/build/bin > 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@zookeeper_init@786: > Initiating client connection, host=localhost:2181 sessionTimeout=10000 > watcher=0x7fc0ae7d59b0 sessionId=0 sessionPasswd=<null> > context=0x7fc09c0008e0 flags=0 > 2014-07-07 17:09:00,526:29499(0x7fc0a5c7c700):ZOO_INFO@check_events@1703: > initiated connection to server [127.0.0.1:2181] > I0707 17:09:00.526564 29520 slave.cpp:143] Slave started on 1)@127.0.0.2:5051 > I0707 17:09:00.526713 29520 slave.cpp:255] Slave resources: cpus(*):8; > mem(*):14750; disk(*):221168; ports(*):[31000-32000] > I0707 17:09:00.526747 29520 slave.cpp:283] Slave hostname: xxxxxx > I0707 17:09:00.526757 29520 slave.cpp:284] Slave checkpoint: true > I0707 17:09:00.527842 29518 state.cpp:33] Recovering state from > '/tmp/mesos/slave/work_dir/meta' > I0707 17:09:00.528142 29516 status_update_manager.cpp:193] Recovering status > update manager > I0707 17:09:00.528244 29517 external_containerizer.cpp:247] Recovering > containerizer > 2014-07-07 17:09:00,544:29499(0x7fc0a5c7c700):ZOO_INFO@check_events@1750: > session establishment complete on server [127.0.0.1:2181], > sessionId=0x14712df6ba4000e, negotiated timeout=10000 > I0707 17:09:00.544852 29516 group.cpp:310] Group process ((4)@127.0.0.2:5051) > connected to ZooKeeper > I0707 17:09:00.544888 29516 group.cpp:784] Syncing group operations: queue > size (joins, cancels, datas) = (0, 0, 0) > I0707 17:09:00.544900 29516 group.cpp:382] Trying to create path '/mesos' in > ZooKeeper > I0707 17:09:00.545446 29518 detector.cpp:135] Detected a new leader: (id='0') > I0707 17:09:00.545524 29515 group.cpp:655] Trying to get > '/mesos/info_0000000000' in ZooKeeper > I0707 17:09:00.545805 29517 detector.cpp:377] A new leading master > (UPID=master@127.0.0.2:5050) is detected > Failed to perform recovery: Recover failed: External containerizer failed > (status: 127) > To remedy this do as follows: > Step 1: rm -f /tmp/mesos/slave/work_dir/meta/slaves/latest > This ensures slave doesn't recover old live executors. > Step 2: Restart the slave. > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)