after a few attempts and adding debugging to libmesos.so , was able to get a few things determined : 1. the mesos-container-executor path was having a mismatch . mesos slave was assuming that it would be in /usr/local/libexec/mesos , but ended up being in /usr/libexec/mesos. i beleive this is likely a weirdness in my setup. 2. the Argument list too long : I beleive the myriad-config-default.yml is expecting a few mandatory attributes: YARN_NODEMANAGER_OPTS: even if this is empty , this probably is needed based on the code I am reading : also seems related to the below issue: could be a redherring , but this was adding recursively that libprocess failed to spawn. https://issues.apache.org/jira/browse/MYRIAD-125 : myriad-scheduler/src/main/java/org/apache/myriad/scheduler/NMExecutorCLGenImpl.java 150 protected void addYarnNodemanagerOpt(String propertyName, String propertyValue) { 151 String envOpt = String.format(PROPERTY_FORMAT, propertyName, propertyValue); 152 if (environment.containsKey(ENV_YARN_NODEMANAGER_OPTS)) { 153 String existingOpts = environment.get(ENV_YARN_NODEMANAGER_OPTS); 154 environment.put(ENV_YARN_NODEMANAGER_OPTS, existingOpts + " " + envOpt); 155 } else { 156 environment.put(ENV_YARN_NODEMANAGER_OPTS, envOpt); 157 } 158 }
2. JAVA_HOME seems required in ubuntu14.04 3. with the above changes, I was able to get nodemanager launched , but for some reason, the mesos slave seems to be killing the executor because of executor-registration-timeout. the nodemanager logs indicate that it sent registration. the resource manager receives the registration, but after 3 mins , the slave kills and spawns another node manager, and this keeps going on and on.. RM: 2015-12-04 15:39:50,962 INFO org.apache.myriad.scheduler.event.handlers.ResourceOffersEventHandler: Launching task: nm.medium.65f4c532-816a-4e5d-b18d-98649c0dac5d using offer: value: "3411a8f9-c9e9-483c-912e-c78c89f948d8-O398" 2015-12-04 15:39:53,337 INFO org.apache.hadoop.yarn.util.RackResolver: Resolved ubuntu1.dev to /default-rack 2015-12-04 15:39:53,337 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: NodeManager from node ubuntu1.dev(cmPort: 31061 httpPort: 31340) registered with capability: <memory:4096, vCores:4>, assigned nodeId ubuntu1.dev:31061 2015-12-04 15:39:53,338 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: ubuntu1.dev:31061 Node Transitioned from NEW to RUNNING 2015-12-04 15:39:53,342 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added node ubuntu1.dev:31061 cluster capacity: <memory:20480, vCores:20> SLAVE: ==> mesos-slave.INFO <== I1204 15:39:50.205615 30329 slave.cpp:3882] Terminating executor myriad_executor3411a8f9-c9e9-483c-912e-c78c89f948d8-00003411a8f9-c9e9-483c-912e-c78c89f948d8-O3603411a8f9-c9e9-483c-912e-c78c89f948d8-S0 of framework 3411a8f9-c9e9-483c-912e-c78c89f948d8-0000 because it did not register within 3mins ==> /var/log/syslog <== Dec 4 15:39:50 ubuntu1 mesos-slave[30307]: I1204 15:39:50.205615 30329 slave.cpp:3882] Terminating executor myriad_executor3411a8f9-c9e9-483c-912e-c78c89f948d8-00003411a8f9-c9e9-483c-912e-c78c89f948d8-O3603411a8f9-c9e9-483c-912e-c78c89f948d8-S0 of framework 3411a8f9-c9e9-483c-912e-c78c89f948d8-0000 because it did not register within 3mins ==> mesos-slave.INFO <== I1204 15:39:50.205842 30329 containerizer.cpp:1097] Destroying container '7452efa7-98c7-4399-ae73-2155aac31229' ==> /var/log/syslog <== Dec 4 15:39:50 ubuntu1 mesos-slave[30307]: I1204 15:39:50.205842 30329 containerizer.cpp:1097] Destroying container '7452efa7-98c7-4399-ae73-2155aac31229' ==> mesos-slave.INFO <== I1204 15:39:50.207988 30330 cgroups.cpp:2433] Freezing cgroup /sys/fs/cgroup/freezer/mesos/7452efa7-98c7-4399-ae73-2155aac31229 ==> /var/log/syslog <== Dec 4 15:39:50 ubuntu1 mesos-slave[30307]: I1204 15:39:50.207988 30330 cgroups.cpp:2433] Freezing cgroup /sys/fs/cgroup/freezer/mesos/7452efa7-98c7-4399-ae73-2155aac31229 ==> mesos-slave.INFO <== I1204 15:39:50.209980 30324 cgroups.cpp:1415] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/7452efa7-98c7-4399-ae73-2155aac31229 after 1.910016ms ==> /var/log/syslog <== Dec 4 15:39:50 ubuntu1 mesos-slave[30307]: I1204 15:39:50.209980 30324 cgroups.cpp:1415] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/7452efa7-98c7-4399-ae73-2155aac31229 after 1.910016ms ==> mesos-slave.INFO <== I1204 15:39:50.212118 30330 cgroups.cpp:2450] Thawing cgroup /sys/fs/cgroup/freezer/mesos/7452efa7-98c7-4399-ae73-2155aac31229 ==> /var/log/syslog <== Dec 4 15:39:50 ubuntu1 mesos-slave[30307]: I1204 15:39:50.212118 30330 cgroups.cpp:2450] Thawing cgroup /sys/fs/cgroup/freezer/mesos/7452efa7-98c7-4399-ae73-2155aac31229 ==> mesos-slave.INFO <== I1204 15:39:50.219007 30330 cgroups.cpp:1444] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/7452efa7-98c7-4399-ae73-2155aac31229 after 6.827008ms I1204 15:39:50.219089 30323 containerizer.cpp:1284] Executor for container '7452efa7-98c7-4399-ae73-2155aac31229' has exited ==> /var/log/syslog <== Dec 4 15:39:50 ubuntu1 mesos-slave[30307]: I1204 15:39:50.219007 30330 cgroups.cpp:1444] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/7452efa7-98c7-4399-ae73-2155aac31229 after 6.827008ms Dec 4 15:39:50 ubuntu1 mesos-slave[30307]: I1204 15:39:50.219089 30323 containerizer.cpp:1284] Executor for container '7452efa7-98c7-4399-ae73-2155aac31229' has exited ==> mesos-slave.INFO <== I1204 15:39:50.221758 30324 slave.cpp:3440] Executor 'myriad_executor3411a8f9-c9e9-483c-912e-c78c89f948d8-00003411a8f9-c9e9-483c-912e-c78c89f948d8-O3603411a8f9-c9e9-483c-912e-c78c89f948d8-S0' of framework 3411a8f9-c9e9-483c-912e-c78c89f948d8-0000 terminated with signal Killed I1204 15:39:50.221938 30324 slave.cpp:2717] Handling status update TASK_FAILED (UUID: e9d9dff0-41d1-450c-9e28-f83133e656a0) for task nm.medium.65f4c532-816a-4e5d-b18d-98649c0dac5d of framework 3411a8f9-c9e9-483c-912e-c78c89f948d8-0000 from @0.0.0.0:0 E1204 15:39:50.222599 30324 slave.cpp:2911] Failed to update resources for container 7452efa7-98c7-4399-ae73-2155aac31229 of executor myriad_executor3411a8f9-c9e9-483c-912e-c78c89f948d8-00003411a8f9-c9e9-483c-912e-c78c89f948d8-O3603411a8f9-c9e9-483c-912e-c78c89f948d8-S0 running task nm.medium.65f4c532-816a-4e5d-b18d-98649c0dac5d on status update for terminal task, destroying container: Container '7452efa7-98c7-4399-ae73-2155aac31229' not found W1204 15:39:50.222831 30324 composing.cpp:514] Container '7452efa7-98c7-4399-ae73-2155aac31229' not found I1204 15:39:50.222864 30324 status_update_manager.cpp:322] Received status update TASK_FAILED (UUID: e9d9dff0-41d1-450c-9e28-f83133e656a0) for task nm.medium.65f4c532-816a-4e5d-b18d-98649c0dac5d of framework 3411a8f9-c9e9-483c-912e-c78c89f948d8-0000 I1204 15:39:50.223323 30324 slave.cpp:3016] Forwarding the update TASK_FAILED (UUID: e9d9dff0-41d1-450c-9e28-f83133e656a0) for task nm.medium.65f4c532-816a-4e5d-b18d-98649c0dac5d of framework 3411a8f9-c9e9-483c-912e-c78c89f948d8-0000 to master@192.168.1.110:5050 ==> /var/log/syslog <== Dec 4 15:39:50 ubuntu1 mesos-slave[30307]: I1204 15:39:50.221758 30324 slave.cpp:3440] Executor 'myriad_executor3411a8f9-c9e9-483c-912e-c78c89f948d8-00003411a8f9-c9e9-483c-912e-c78c89f948d8-O3603411a8f9-c9e9-483c-912e-c78c89f948d8-S0' of framework 3411a8f9-c9e9-483c-912e-c78c89f948d8-0000 terminated with signal Killed Dec 4 15:39:50 ubuntu1 mesos-slave[30307]: I1204 15:39:50.221938 30324 slave.cpp:2717] Handling status update TASK_FAILED (UUID: e9d9dff0-41d1-450c-9e28-f83133e656a0) for task nm.medium.65f4c532-816a-4e5d-b18d-98649c0dac5d of framework 3411a8f9-c9e9-483c-912e-c78c89f948d8-0000 from @0.0.0.0:0 Dec 4 15:39:50 ubuntu1 mesos-slave[30307]: E1204 15:39:50.222599 30324 slave.cpp:2911] Failed to update resources for container 7452efa7-98c7-4399-ae73-2155aac31229 of executor myriad_executor3411a8f9-c9e9-483c-912e-c78c89f948d8-00003411a8f9-c9e9-483c-912e-c78c89f948d8-O3603411a8f9-c9e9-483c-912e-c78c89f948d8-S0 running task nm.medium.65f4c532-816a-4e5d-b18d-98649c0dac5d on status update for terminal task, destroying container: Container '7452efa7-98c7-4399-ae73-2155aac31229' not found Dec 4 15:39:50 ubuntu1 mesos-slave[30307]: W1204 15:39:50.222831 30324 composing.cpp:514] Container '7452efa7-98c7-4399-ae73-2155aac31229' not found Dec 4 15:39:50 ubuntu1 mesos-slave[30307]: I1204 15:39:50.222864 30324 status_update_manager.cpp:322] Received status update TASK_FAILED (UUID: e9d9dff0-41d1-450c-9e28-f83133e656a0) for task nm.medium.65f4c532-816a-4e5d-b18d-98649c0dac5d of framework 3411a8f9-c9e9-483c-912e-c78c89f948d8-0000 Dec 4 15:39:50 ubuntu1 mesos-slave[30307]: I1204 15:39:50.223323 30324 slave.cpp:3016] Forwarding the update TASK_FAILED (UUID: e9d9dff0-41d1-450c-9e28-f83133e656a0) for task nm.medium.65f4c532-816a-4e5d-b18d-98649c0dac5d of framework 3411a8f9-c9e9-483c-912e-c78c89f948d8-0000 to master@192.168.1.110:5050 Executor: 15/12/04 15:24:48 INFO ipc.Server: IPC Server Responder: starting 15/12/04 15:24:48 INFO ipc.Server: IPC Server listener on 31208: starting 15/12/04 15:24:48 INFO security.NMContainerTokenSecretManager: Updating node address : ubuntu1.dev:31208 15/12/04 15:24:48 INFO ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue 15/12/04 15:24:48 INFO ipc.Server: Starting Socket Reader #1 for port 31133 15/12/04 15:24:48 INFO pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB to the server 15/12/04 15:24:48 INFO ipc.Server: IPC Server Responder: starting 15/12/04 15:24:48 INFO ipc.Server: IPC Server listener on 31133: starting 15/12/04 15:24:48 INFO localizer.ResourceLocalizationService: Localizer started on port 31133 15/12/04 15:24:48 INFO mapred.IndexCache: IndexCache created with max memory = 10485760 15/12/04 15:24:49 INFO mapred.ShuffleHandler: httpshuffle listening on port 31007 15/12/04 15:24:49 INFO containermanager.ContainerManagerImpl: ContainerManager started at ubuntu1.dev/192.168.1.33:31208 15/12/04 15:24:49 INFO containermanager.ContainerManagerImpl: ContainerManager bound to 0.0.0.0/0.0.0.0:31208 15/12/04 15:24:49 INFO webapp.WebServer: Instantiating NMWebApp at 0.0.0.0:31550 15/12/04 15:24:49 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 15/12/04 15:24:49 INFO server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets. 15/12/04 15:24:49 INFO http.HttpRequestLog: Http request log for http.requests.nodemanager is not defined 15/12/04 15:24:49 INFO http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) 15/12/04 15:24:49 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context node 15/12/04 15:24:49 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 15/12/04 15:24:49 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 15/12/04 15:24:49 INFO http.HttpServer2: adding path spec: /node/* 15/12/04 15:24:49 INFO http.HttpServer2: adding path spec: /ws/* 15/12/04 15:24:49 INFO http.HttpServer2: Jetty bound to port 31550 15/12/04 15:24:49 INFO mortbay.log: jetty-6.1.26 15/12/04 15:24:49 INFO mortbay.log: Extract jar:file:/media/ubuntu/drive/oshome/ubuntu/deps/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-common-2.7.1.jar!/webapps/node to /tmp/Jetty_0_0_0_0_31550_node____.4y6r65/webapp 15/12/04 15:24:49 INFO mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:31550 15/12/04 15:24:49 INFO webapp.WebApps: Web app /node started at 31550 15/12/04 15:24:49 INFO webapp.WebApps: Registered webapp guice modules 15/12/04 15:24:49 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.110:8031 15/12/04 15:24:49 INFO nodemanager.NodeStatusUpdaterImpl: Sending out 0 NM container statuses: [] 15/12/04 15:24:49 INFO nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[] 15/12/04 15:24:49 INFO security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id 325621781 15/12/04 15:24:49 INFO security.NMTokenSecretManagerInNM: Rolling master-key for container-tokens, got key with id -126470083 15/12/04 15:24:49 INFO nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as ubuntu1.dev:31208 with total resource of <memory:4096, vCores:4> 15/12/04 15:24:49 INFO nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests Dec 04, 2015 3:26:59 PM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get On Fri, Dec 4, 2015 at 11:32 AM, Prabhu Inbarajan < inbarajan.pra...@gmail.com> wrote: > seems ok to me. But I wasnt really clear on frameworkUser , and > frameworkSuperUser - both have necessary permissions. I am also trying > out the dockerizerd resource manager from Brandon. Apologies for the > verbosity... > > myriad-config-default.xml > > mesosMaster: 192.168.1.110:5050 > checkpoint: false > frameworkFailoverTimeout: 43200000 > frameworkName: MyriadAlpha > frameworkRole: > frameworkUser: ubuntu > frameworkSuperUser: root > nativeLibrary: /usr/local/lib/libmesos.so > zkServers: localhost:2181 > zkTimeout: 20000 > restApiPort: 8192 > > > resource manager logs : > docker run --net=host -e HADOOP_VER=2.7.1 -e HADOOP_USER=ubuntu -e > HADOOP_NAMENODE=192.168.1.110 -t -v > ~/Dev/proto/incubator-myriad/docker/config:/myriad-conf > prabhuinbarajan/myriad > ----------------- > 15/12/04 19:17:21 INFO handlers.ResourceOffersEventHandler: Offer not > sufficient for task with, cpu: 4.4, memory: 5504.0, ports: 997 > 15/12/04 19:17:24 INFO resourcemanager.ClientRMService: Allocated new > applicationId: 1 > 15/12/04 19:17:24 INFO handlers.ResourceOffersEventHandler: Received > offers 1 > 15/12/04 19:17:24 INFO handlers.ResourceOffersEventHandler: Offer not > sufficient for task with, cpu: 4.4, memory: 5504.0, ports: 999 > 15/12/04 19:17:27 INFO handlers.ResourceOffersEventHandler: Received > offers 1 > 15/12/04 19:17:27 INFO handlers.ResourceOffersEventHandler: Offer not > sufficient for task with, cpu: 4.4, memory: 5504.0, ports: 997 > 15/12/04 19:17:29 INFO handlers.ResourceOffersEventHandler: Received > offers 1 > 15/12/04 19:17:29 INFO handlers.ResourceOffersEventHandler: Offer not > sufficient for task with, cpu: 4.4, memory: 5504.0, ports: 999 > 15/12/04 19:17:32 INFO handlers.ResourceOffersEventHandler: Received > offers 1 > 15/12/04 19:17:32 INFO handlers.ResourceOffersEventHandler: Offer not > sufficient for task with, cpu: 4.4, memory: 5504.0, ports: 997 > 15/12/04 19:17:34 INFO handlers.ResourceOffersEventHandler: Received > offers 1 > 15/12/04 19:17:34 INFO handlers.ResourceOffersEventHandler: Offer not > sufficient for task with, cpu: 4.4, memory: 5504.0, ports: 999 > 15/12/04 19:17:37 INFO handlers.ResourceOffersEventHandler: Received > offers 1 > 15/12/04 19:17:37 INFO handlers.ResourceOffersEventHandler: Offer not > sufficient for task with, cpu: 4.4, memory: 5504.0, ports: 997 > 15/12/04 19:17:39 INFO handlers.ResourceOffersEventHandler: Received > offers 1 > 15/12/04 19:17:39 INFO handlers.ResourceOffersEventHandler: Offer not > sufficient for task with, cpu: 4.4, memory: 5504.0, ports: 999 > 15/12/04 19:17:39 INFO handlers.ResourceOffersEventHandler: Offer not > sufficient for task with, cpu: 4.4, memory: 5504.0, ports: 999 > 15/12/04 19:17:40 INFO resourcemanager.ClientRMService: Application with > id 1 submitted by user ubuntu > 15/12/04 19:17:40 INFO rmapp.RMAppImpl: Storing application with id > application_1449255931427_0001 > 15/12/04 19:17:40 INFO resourcemanager.RMAuditLogger: USER=ubuntu > IP=192.168.1.33 OPERATION=Submit Application Request > TARGET=ClientRMService RESULT=SUCCESS > APPID=application_1449255931427_0001 > 15/12/04 19:17:40 INFO rmapp.RMAppImpl: application_1449255931427_0001 > State change from NEW to NEW_SAVING > 15/12/04 19:17:40 INFO recovery.RMStateStore: Storing info for app: > application_1449255931427_0001 > 15/12/04 19:17:40 INFO rmapp.RMAppImpl: application_1449255931427_0001 > State change from NEW_SAVING to SUBMITTED > 15/12/04 19:17:40 WARN security.UserGroupInformation: No groups available > for user ubuntu > 15/12/04 19:17:40 INFO fair.FairScheduler: Accepted application > application_1449255931427_0001 from user: ubuntu, in queue: default, > currently num of applications: 1 > 15/12/04 19:17:40 INFO rmapp.RMAppImpl: application_1449255931427_0001 > State change from SUBMITTED to ACCEPTED > 15/12/04 19:17:40 INFO resourcemanager.ApplicationMasterService: > Registering app attempt : appattempt_1449255931427_0001_000001 > 15/12/04 19:17:40 INFO attempt.RMAppAttemptImpl: > appattempt_1449255931427_0001_000001 State change from NEW to SUBMITTED > 15/12/04 19:17:40 INFO fair.FairScheduler: Added Application Attempt > appattempt_1449255931427_0001_000001 to scheduler from user: ubuntu > 15/12/04 19:17:40 INFO attempt.RMAppAttemptImpl: > appattempt_1449255931427_0001_000001 State change from SUBMITTED to > SCHEDULED > 15/12/04 19:17:42 INFO handlers.ResourceOffersEventHandler: Received > offers 1 > 15/12/04 19:17:42 INFO handlers.ResourceOffersEventHandler: Offer not > sufficient for task with, cpu: 4.4, memory: 5504.0, ports: 997 > 15/12/04 19:17:44 INFO handlers.ResourceOffersEventHandler: Received > offers 1 > 15/12/04 19:17:44 INFO handlers.ResourceOffersEventHandler: Offer not > sufficient for task with, cpu: 4.4, memory: 5504.0, ports: 999 > 15/12/04 19:17:47 INFO handlers.ResourceOffersEventHandler: Received > offers 1 > > mesos slave 1 logs: > ------------------------ > ubuntu@ubuntu1:/var/log/mesos$ tail -f mesos-slave.{ERROR,INFO,WARNING} > > ==> mesos-slave.INFO <== > W1204 11:19:25.484510 24660 slave.cpp:4505] Failed to get resource > statistics for executor > 'myriad_executor5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-00015958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-O55925958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0' > of framework 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-0001: Unknown container: > d69de4bd-d696-44a8-8a1f-a1909515ef40 > W1204 11:19:28.495157 24654 slave.cpp:4505] Failed to get resource > statistics for executor > 'myriad_executor5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-00015958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-O55925958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0' > of framework 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-0001: Unknown container: > d69de4bd-d696-44a8-8a1f-a1909515ef40 > I1204 11:19:34.253932 24655 slave.cpp:3926] Current disk usage 36.45%. Max > allowed age: 3.748370011361713days > I1204 11:19:40.139240 24656 http.cpp:189] HTTP GET for /slave(1)/state > from 192.168.1.3:62644 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac > OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.73 > Safari/537.36' > W1204 11:19:40.169255 24658 slave.cpp:4505] Failed to get resource > statistics for executor > 'myriad_executor5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-00015958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-O55925958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0' > of framework 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-0001: Unknown container: > d69de4bd-d696-44a8-8a1f-a1909515ef40 > W1204 11:19:40.683851 24654 slave.cpp:4505] Failed to get resource > statistics for executor > 'myriad_executor5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-00015958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-O55925958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0' > of framework 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-0001: Unknown container: > d69de4bd-d696-44a8-8a1f-a1909515ef40 > I1204 11:20:34.254957 24655 slave.cpp:3926] Current disk usage 36.45%. Max > allowed age: 3.748366616183600days > I1204 11:21:34.255574 24657 slave.cpp:3926] Current disk usage 36.45%. Max > allowed age: 3.748362372210949days > I1204 11:22:34.256463 24659 slave.cpp:3926] Current disk usage 36.45%. Max > allowed age: 3.748358977032824days > I1204 11:23:34.257474 24659 slave.cpp:3926] Current disk usage 36.45%. Max > allowed age: 3.748355581854711days > > ==> mesos-slave.WARNING <== > W1204 11:19:16.630030 24655 slave.cpp:4505] Failed to get resource > statistics for executor > 'myriad_executor5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-00015958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-O55925958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0' > of framework 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-0001: Unknown container: > d69de4bd-d696-44a8-8a1f-a1909515ef40 > W1204 11:19:17.130801 24656 slave.cpp:4505] Failed to get resource > statistics for executor > 'myriad_executor5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-00015958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-O55925958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0' > of framework 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-0001: Unknown container: > d69de4bd-d696-44a8-8a1f-a1909515ef40 > W1204 11:19:19.390177 24658 slave.cpp:4505] Failed to get resource > statistics for executor > 'myriad_executor5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-00015958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-O55925958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0' > of framework 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-0001: Unknown container: > d69de4bd-d696-44a8-8a1f-a1909515ef40 > W1204 11:19:19.904623 24657 slave.cpp:4505] Failed to get resource > statistics for executor > 'myriad_executor5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-00015958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-O55925958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0' > of framework 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-0001: Unknown container: > d69de4bd-d696-44a8-8a1f-a1909515ef40 > W1204 11:19:22.916709 24656 slave.cpp:4505] Failed to get resource > statistics for executor > 'myriad_executor5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-00015958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-O55925958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0' > of framework 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-0001: Unknown container: > d69de4bd-d696-44a8-8a1f-a1909515ef40 > W1204 11:19:24.949373 24661 slave.cpp:4505] Failed to get resource > statistics for executor > 'myriad_executor5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-00015958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-O55925958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0' > of framework 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-0001: Unknown container: > d69de4bd-d696-44a8-8a1f-a1909515ef40 > W1204 11:19:25.484510 24660 slave.cpp:4505] Failed to get resource > statistics for executor > 'myriad_executor5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-00015958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-O55925958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0' > of framework 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-0001: Unknown container: > d69de4bd-d696-44a8-8a1f-a1909515ef40 > W1204 11:19:28.495157 24654 slave.cpp:4505] Failed to get resource > statistics for executor > 'myriad_executor5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-00015958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-O55925958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0' > of framework 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-0001: Unknown container: > d69de4bd-d696-44a8-8a1f-a1909515ef40 > W1204 11:19:40.169255 24658 slave.cpp:4505] Failed to get resource > statistics for executor > 'myriad_executor5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-00015958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-O55925958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0' > of framework 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-0001: Unknown container: > d69de4bd-d696-44a8-8a1f-a1909515ef40 > W1204 11:19:40.683851 24654 slave.cpp:4505] Failed to get resource > statistics for executor > 'myriad_executor5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-00015958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-O55925958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0' > of framework 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-0001: Unknown container: > d69de4bd-d696-44a8-8a1f-a1909515ef40 > > > > mesos - slave 2 logs : > ------------------------------ > ubuntu@ubuntu2:/var/log/mesos$ tail -f mesos-slave.{INFO,ERROR,WARNING} > ==> mesos-slave.INFO <== > I1204 11:18:46.767786 27777 http.cpp:189] HTTP GET for /slave(1)/state > from 192.168.1.3:62649 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac > OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.73 > Safari/537.36' > I1204 11:18:50.083156 27775 http.cpp:189] HTTP GET for /slave(1)/state > from 192.168.1.3:62649 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac > OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.73 > Safari/537.36' > I1204 11:18:51.238821 27773 http.cpp:189] HTTP GET for /slave(1)/state > from 192.168.1.3:62649 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac > OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.73 > Safari/537.36' > I1204 11:18:58.242902 27778 slave.cpp:3926] Current disk usage 1.47%. Max > allowed age: 6.196790774951609days > I1204 11:19:58.244071 27779 slave.cpp:3926] Current disk usage 1.47%. Max > allowed age: 6.196764178349270days > I1204 11:20:58.244828 27778 slave.cpp:3926] Current disk usage 1.48%. Max > allowed age: 6.196737405610486days > I1204 11:21:58.246207 27777 slave.cpp:3926] Current disk usage 1.48%. Max > allowed age: 6.196710949917303days > I1204 11:22:58.247319 27777 slave.cpp:3926] Current disk usage 1.48%. Max > allowed age: 6.196684494224109days > I1204 11:23:58.248172 27775 slave.cpp:3926] Current disk usage 1.48%. Max > allowed age: 6.196657791939908days > I1204 11:24:58.249053 27776 slave.cpp:3926] Current disk usage 1.48%. Max > allowed age: 6.196631089655694days > > > > mesos master logs: > -------------------------- > ubuntu@master:/var/log/mesos$ tail -f mesos-master.{ERROR,INFO,WARNING} > > ==> mesos-master.INFO <== > I1204 11:21:06.762050 8915 master.cpp:4967] Sending 1 offers to framework > 1669be41-1157-4e33-a374-7d1e30674eb3-0000 (MyriadAlpha) at > scheduler-5435ce94-c426-4788-a850-7e57399e31c3@192.168.1.33:59514 > I1204 11:21:06.768584 8914 master.cpp:2918] Processing ACCEPT call for > offers: [ 1669be41-1157-4e33-a374-7d1e30674eb3-O478 ] on slave > 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0 at slave(1)@192.168.1.33:5051 > (ubuntu1.dev) for framework 1669be41-1157-4e33-a374-7d1e30674eb3-0000 > (MyriadAlpha) at > scheduler-5435ce94-c426-4788-a850-7e57399e31c3@192.168.1.33:59514 > I1204 11:21:06.768824 8914 hierarchical.hpp:1103] Recovered cpus(*):3.8; > mem(*):1694.6; disk(*):27094; ports(*):[31000-31111, 31113-31443, > 31445-31816, 31818-31875, 31877-32000] (total: cpus(*):8; mem(*):6917; > disk(*):27094; ports(*):[31000-32000], allocated: cpus(*):4.2; > mem(*):5222.4; ports(*):[31112-31112, 31444-31444, 31817-31817, > 31876-31876]) on slave 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0 from > framework 1669be41-1157-4e33-a374-7d1e30674eb3-0000 > I1204 11:21:10.771049 8917 master.cpp:4967] Sending 1 offers to framework > 1669be41-1157-4e33-a374-7d1e30674eb3-0000 (MyriadAlpha) at > scheduler-5435ce94-c426-4788-a850-7e57399e31c3@192.168.1.33:59514 > I1204 11:21:10.775462 8916 master.cpp:2918] Processing ACCEPT call for > offers: [ 1669be41-1157-4e33-a374-7d1e30674eb3-O479 ] on slave > 2c87707a-156a-4edc-908e-9f90761b32f2-S0 at slave(1)@192.168.1.34:5051 > (192.168.1.34) for framework 1669be41-1157-4e33-a374-7d1e30674eb3-0000 > (MyriadAlpha) at > scheduler-5435ce94-c426-4788-a850-7e57399e31c3@192.168.1.33:59514 > I1204 11:21:10.779511 8915 hierarchical.hpp:1103] Recovered cpus(*):3.7; > mem(*):6660; disk(*):771089; ports(*):[31000-31563, 31565-31926, > 31928-32000] (total: cpus(*):4; mem(*):6708; disk(*):771089; > ports(*):[31000-32000], allocated: cpus(*):0.3; mem(*):48; > ports(*):[31564-31564, 31927-31927]) on slave > 2c87707a-156a-4edc-908e-9f90761b32f2-S0 from framework > 1669be41-1157-4e33-a374-7d1e30674eb3-0000 > I1204 11:21:11.773802 8912 master.cpp:4967] Sending 1 offers to framework > 1669be41-1157-4e33-a374-7d1e30674eb3-0000 (MyriadAlpha) at > scheduler-5435ce94-c426-4788-a850-7e57399e31c3@192.168.1.33:59514 > I1204 11:21:11.781312 8913 master.cpp:2918] Processing ACCEPT call for > offers: [ 1669be41-1157-4e33-a374-7d1e30674eb3-O480 ] on slave > 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0 at slave(1)@192.168.1.33:5051 > (ubuntu1.dev) for framework 1669be41-1157-4e33-a374-7d1e30674eb3-0000 > (MyriadAlpha) at > scheduler-5435ce94-c426-4788-a850-7e57399e31c3@192.168.1.33:59514 > I1204 11:21:11.781577 8913 hierarchical.hpp:1103] Recovered cpus(*):3.8; > mem(*):1694.6; disk(*):27094; ports(*):[31000-31111, 31113-31443, > 31445-31816, 31818-31875, 31877-32000] (total: cpus(*):8; mem(*):6917; > disk(*):27094; ports(*):[31000-32000], allocated: cpus(*):4.2; > mem(*):5222.4; ports(*):[31112-31112, 31444-31444, 31817-31817, > 31876-31876]) on slave 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0 from > framework 1669be41-1157-4e33-a374-7d1e30674eb3-0000 > I1204 11:21:14.238896 8914 http.cpp:336] HTTP GET for /master/state.json > from 192.168.1.3:62643 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac > OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.73 > Safari/537.36' > > ==> mesos-master.WARNING <== > W1204 11:18:49.456564 8917 master.cpp:4408] Ignoring status update > TASK_LOST (UUID: 7dd56987-06a8-472e-9dfb-49b482d994db) for task > nm.medium.3410a508-4345-49ca-b658-ac3438202d82 of framework > 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-0001 from slave > 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0 at slave(1)@192.168.1.33:5051 > (ubuntu1.dev) because the framework is unknown > > ==> mesos-master.INFO <== > I1204 11:21:15.794827 8911 master.cpp:4967] Sending 1 offers to framework > 1669be41-1157-4e33-a374-7d1e30674eb3-0000 (MyriadAlpha) at > scheduler-5435ce94-c426-4788-a850-7e57399e31c3@192.168.1.33:59514 > I1204 11:21:15.806252 8917 master.cpp:2918] Processing ACCEPT call for > offers: [ 1669be41-1157-4e33-a374-7d1e30674eb3-O481 ] on slave > 2c87707a-156a-4edc-908e-9f90761b32f2-S0 at slave(1)@192.168.1.34:5051 > (192.168.1.34) for framework 1669be41-1157-4e33-a374-7d1e30674eb3-0000 > (MyriadAlpha) at > scheduler-5435ce94-c426-4788-a850-7e57399e31c3@192.168.1.33:59514 > I1204 11:21:15.806967 8917 hierarchical.hpp:1103] Recovered cpus(*):3.7; > mem(*):6660; disk(*):771089; ports(*):[31000-31563, 31565-31926, > 31928-32000] (total: cpus(*):4; mem(*):6708; disk(*):771089; > ports(*):[31000-32000], allocated: cpus(*):0.3; mem(*):48; > ports(*):[31564-31564, 31927-31927]) on slave > 2c87707a-156a-4edc-908e-9f90761b32f2-S0 from framework > 1669be41-1157-4e33-a374-7d1e30674eb3-0000 > I1204 11:21:16.804147 8914 master.cpp:4967] Sending 1 offers to framework > 1669be41-1157-4e33-a374-7d1e30674eb3-0000 (MyriadAlpha) at > scheduler-5435ce94-c426-4788-a850-7e57399e31c3@192.168.1.33:59514 > I1204 11:21:16.811523 8918 master.cpp:2918] Processing ACCEPT call for > offers: [ 1669be41-1157-4e33-a374-7d1e30674eb3-O482 ] on slave > 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0 at slave(1)@192.168.1.33:5051 > (ubuntu1.dev) for framework 1669be41-1157-4e33-a374-7d1e30674eb3-0000 > (MyriadAlpha) at > scheduler-5435ce94-c426-4788-a850-7e57399e31c3@192.168.1.33:59514 > I1204 11:21:16.811856 8918 hierarchical.hpp:1103] Recovered cpus(*):3.8; > mem(*):1694.6; disk(*):27094; ports(*):[31000-31111, 31113-31443, > 31445-31816, 31818-31875, 31877-32000] (total: cpus(*):8; mem(*):6917; > disk(*):27094; ports(*):[31000-32000], allocated: cpus(*):4.2; > mem(*):5222.4; ports(*):[31112-31112, 31444-31444, 31817-31817, > 31876-31876]) on slave 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0 from > framework 1669be41-1157-4e33-a374-7d1e30674eb3-0000 > I1204 11:21:20.824043 8911 master.cpp:4967] Sending 1 offers to framework > 1669be41-1157-4e33-a374-7d1e30674eb3-0000 (MyriadAlpha) at > scheduler-5435ce94-c426-4788-a850-7e57399e31c3@192.168.1.33:59514 > I1204 11:21:20.833861 8917 master.cpp:2918] Processing ACCEPT call for > offers: [ 1669be41-1157-4e33-a374-7d1e30674eb3-O483 ] on slave > 2c87707a-156a-4edc-908e-9f90761b32f2-S0 at slave(1)@192.168.1.34:5051 > (192.168.1.34) for framework 1669be41-1157-4e33-a374-7d1e30674eb3-0000 > (MyriadAlpha) at > scheduler-5435ce94-c426-4788-a850-7e57399e31c3@192.168.1.33:59514 > I1204 11:21:20.835531 8913 hierarchical.hpp:1103] Recovered cpus(*):3.7; > mem(*):6660; disk(*):771089; ports(*):[31000-31563, 31565-31926, > 31928-32000] (total: cpus(*):4; mem(*):6708; disk(*):771089; > ports(*):[31000-32000], allocated: cpus(*):0.3; mem(*):48; > ports(*):[31564-31564, 31927-31927]) on slave > 2c87707a-156a-4edc-908e-9f90761b32f2-S0 from framework > 1669be41-1157-4e33-a374-7d1e30674eb3-0000 > I1204 11:21:21.826283 8911 master.cpp:4967] Sending 1 offers to framework > 1669be41-1157-4e33-a374-7d1e30674eb3-0000 (MyriadAlpha) at > scheduler-5435ce94-c426-4788-a850-7e57399e31c3@192.168.1.33:59514 > I1204 11:21:21.835597 8916 master.cpp:2918] Processing ACCEPT call for > offers: [ 1669be41-1157-4e33-a374-7d1e30674eb3-O484 ] on slave > 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0 at slave(1)@192.168.1.33:5051 > (ubuntu1.dev) for framework 1669be41-1157-4e33-a374-7d1e30674eb3-0000 > (MyriadAlpha) at > scheduler-5435ce94-c426-4788-a850-7e57399e31c3@192.168.1.33:59514 > I1204 11:21:21.837003 8916 hierarchical.hpp:1103] Recovered cpus(*):3.8; > mem(*):1694.6; disk(*):27094; ports(*):[31000-31111, 31113-31443, > 31445-31816, 31818-31875, 31877-32000] (total: cpus(*):8; mem(*):6917; > disk(*):27094; ports(*):[31000-32000], allocated: cpus(*):4.2; > mem(*):5222.4; ports(*):[31112-31112, 31444-31444, 31817-31817, > 31876-31876]) on slave 5958fb41-3d1c-4d8d-8d00-d82a0aa6d9e1-S0 from > framework 1669be41-1157-4e33-a374-7d1e30674eb3-0000 > > > > myriad executor logs: > ---------------------------- > > ABORT: > (/tmp/mesos-build/mesos-repo/3rdparty/libprocess/src/subprocess.cpp:177): > Failed to os::execvpe in childMain: Argument list too long*** Aborted at > 1449249974 (unix time) try "date -d @1449249974" if you are using GNU date *** > PC: @ 0x7fbfd2c66cc9 (unknown) > *** SIGABRT (@0x5893) received by PID 22675 (TID 0x7fbfcbc4f700) from PID > 22675; stack trace: *** > @ 0x7fbfd3005340 (unknown) > @ 0x7fbfd2c66cc9 (unknown) > @ 0x7fbfd2c6a0d8 (unknown) > @ 0x40a902 _Abort() > @ 0x40a93c _Abort() > @ 0x7fbfd477ac3b process::childMain() > @ 0x7fbfd477cc6d std::_Function_handler<>::_M_invoke() > @ 0x7fbfd2d2a47d (unknown) > > > > > On Fri, Dec 4, 2015 at 11:09 AM, Sarjeet Singh <sarjeetsi...@maprtech.com> > wrote: > >> Prabhu, >> >> Can you paste/send your mesos-slave, mesos-master log file, if this is OK? >> >> P.S., We might have seen this when frameworkUser was not set correctly in >> myriad-config-default.yml. Can you double check if all configuration are >> correct and the permissions are OK as well? >> >> -Sarjeet >> >> On Fri, Dec 4, 2015 at 10:53 AM, Prabhu Inbarajan < >> inbarajan.pra...@gmail.com> wrote: >> >> > I followed the myriad setup instructions , and was able to get resource >> > manager invoke the myriad scheduler and talk to the mesos master. But I >> > see the following error in the mesos slave logs and my yarn submissions >> are >> > stuck. >> > >> > My setup is as follows: >> > 1. Hadoop 2.7.1 >> > 2. Jdk8 >> > 3. Mesos Version: 0.25.0 >> > 4. 1 master + 2 slaves >> > 5. ubuntu 14.04 + Kernel Linux master.dev 3.19.0-33-generic >> > #38~14.04.1-Ubuntu SMP Fri Nov 6 18:17:28 UTC 2015 x86_64 x86_64 x86_64 >> > GNU/Linux >> > >> > Given this team is running with this, it is hard for me to presume this >> is >> > a argument overflow issue and would require somekind of a kernel >> recompile >> > : http://www.linuxjournal.com/article/6060?page=0,0. I am also >> thinking if >> > to recompile mesos for better diagnostics. the subprocess.cpp seems to >> have >> > better logging in master : >> > >> > >> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/subprocess.cpp >> > than in 0.25.0 >> > >> > >> > >> > ABORT: >> > >> (/tmp/mesos-build/mesos-repo/3rdparty/libprocess/src/subprocess.cpp:177): >> > Failed to os::execvpe in childMain: Argument list too long*** Aborted >> > at 1449220361 (unix time) try "date -d @1449220361" if you are using >> > GNU date *** >> > PC: @ 0x7fbfd2c66cc9 (unknown) >> > *** SIGABRT (@0x231d) received by PID 8989 (TID 0x7fbfc944a700) from >> > PID 8989; stack trace: *** >> > @ 0x7fbfd3005340 (unknown) >> > @ 0x7fbfd2c66cc9 (unknown) >> > @ 0x7fbfd2c6a0d8 (unknown) >> > @ 0x40a902 _Abort() >> > @ 0x40a93c _Abort() >> > @ 0x7fbfd477ac3b process::childMain() >> > @ 0x7fbfd477cc6d std::_Function_handler<>::_M_invoke() >> > @ 0x7fbfd2d2a47d (unknown) >> > >> > >