[jira] [Commented] (MESOS-5439) registerExecutor problem
[ https://issues.apache.org/jira/browse/MESOS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308814#comment-15308814 ] Gilbert Song commented on MESOS-5439: - hi [~wnghksrla001], are you saying it is only slow between 'Forked child with pid' and 'Got registration for executor', or you are saying all the agent logging is slow. If it is the former case, it may be related to the executor. As an usual case, it should be pretty quick. You can test it out to launch some similar tasks using mesos-execute with command executor. > registerExecutor problem > > > Key: MESOS-5439 > URL: https://issues.apache.org/jira/browse/MESOS-5439 > Project: Mesos > Issue Type: Bug > Components: c++ api, slave >Affects Versions: 0.27.0 >Reporter: kimjoohwan > > Currently, we are using Mesos 0.27.0. The master is build up with a Intel(R) > Core(TM) i5-3470 CPU @ 3.20GHz CPU and a 4GB RAM. The slave (Banana PI) is > build up with a Cortex -A7 Dual-Core CPU and a 1GB RAM. > By using the Mesos API, we have developed and completed the execution of the > framework which is based on python. > but, we found that it takes too much time between the messages, 'Forked child > with pid' and 'Got registration for executor' from the slave log. (5sec) > If you know how to deal with this problem, please let us know. > I0523 17:38:16.264289 1787 slave.cpp:5208] Launching executor default of > framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 with resources in work > directory > '/tmp/mesos/slaves/3fb86eea-96c4-4b07-aaa2-caf071275bdf-S2/frameworks/3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010/executors/default/runs/1c830c9a-4120-4ef0-af80-49a52d307539' > I0523 17:38:16.290601 1789 containerizer.cpp:616] Starting container > '1c830c9a-4120-4ef0-af80-49a52d307539' for executor 'default' of framework > '3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010' > I0523 17:38:16.293285 1787 slave.cpp:1626] Queuing task '0' for executor > 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 > I0523 17:38:16.297369 1787 slave.cpp:4233] Current disk usage 2.14%. Max > allowed age: 6.150293798159722days > I0523 17:38:16.504043 1789 launcher.cpp:132] Forked child with pid '1837' > for container '1c830c9a-4120-4ef0-af80-49a52d307539' > I0523 17:38:21.510535 1785 slave.cpp:2573] Got registration for executor > 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from > executor(1)@192.168.0.8:56508 > I0523 17:38:21.554608 1785 slave.cpp:1791] Sending queued task '0' to > executor 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 at > executor(1)@192.168.0.8:56508 > I0523 17:38:21.594511 1789 slave.cpp:2932] Handling status update > TASK_RUNNING (UUID: cd04ec2a-0e68-460a-ad2e-e4f504f3b032) for task 0 of > framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from > executor(1)@192.168.0.8:56508 > I0523 17:38:21.600050 1789 slave.cpp:2932] Handling status update > TASK_FINISHED (UUID: 46e110c8-4078-4f98-ae30-30b3a1376034) for task 0 of > framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from > executor(1)@192.168.0.8:56508 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5439) registerExecutor problem
[ https://issues.apache.org/jira/browse/MESOS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305929#comment-15305929 ] kimjoohwan commented on MESOS-5439: --- Hello Joseph, Thank you for your comments 1. How many tasks are you launching at once? (i.e. from a single offer) And how many over a given time? I using this framework #!/usr/bin/env python import os import sys import time import datetime import mesos.interface from mesos.interface import mesos_pb2 import mesos.native TOTAL_TASKS = 32 TASK_CPUS = 1 TASK_MEM = 350 class TestScheduler(mesos.interface.Scheduler): def __init__(self, implicitAcknowledgements, executor): self.implicitAcknowledgements = implicitAcknowledgements self.executor = executor self.taskData = {} self.tasksLaunched = 0 self.tasksFinished = 0 self.messagesSent = 0 self.messagesReceived = 0 self.result = " " self.data = " " self.tasks = [] self.start = " " self.end = " " self.finish = " " self.time1 = {} self.time2 = {} self.time3 = {} self.time4 = {} self.time5 = {} self.time6 = {} self.time7 = {} self.time8 = {} self.time0 = {} self.count = 0 self.count2 = 0 def work1(self, offer): tid = self.tasksLaunched self.tasksLaunched += 1 tasks = [] print "Launching egrep_task %d using offer %s " \ % (tid, offer.hostname) task = mesos_pb2.TaskInfo() task.task_id.value = str(tid) task.slave_id.value = offer.slave_id.value task.name = "task %d" % tid executor.executor_id.value = str(tid) executor.command.value = os.path.abspath("./work1-executor") task.executor.MergeFrom(self.executor) cpus = task.resources.add() cpus.name = "cpus" cpus.type = mesos_pb2.Value.SCALAR cpus.scalar.value = TASK_CPUS mem = task.resources.add() mem.name = "mem" mem.type = mesos_pb2.Value.SCALAR mem.scalar.value = TASK_MEM return task def work2(self, offer): tasks = [] tid = self.tasksLaunched self.tasksLaunched += 1 print "Launching wc_task %d using offer %s" \ % (tid, offer.hostname) task = mesos_pb2.TaskInfo() task.task_id.value = str(tid) task.slave_id.value = offer.slave_id.value task.name = "task %d" % tid executor.executor_id.value = str(tid) executor.command.value = os.path.abspath("./work2-executor") task.executor.MergeFrom(self.executor) cpus = task.resources.add() cpus.name = "cpus" cpus.type = mesos_pb2.Value.SCALAR cpus.scalar.value = TASK_CPUS mem = task.resources.add() mem.name = "mem" mem.type = mesos_pb2.Value.SCALAR mem.scalar.value = TASK_MEM print "work2" return task def work3(self, offer): tid = self.tasksLaunched self.tasksLaunched += 1 tasks = [] print "Launching egrep_task %d using offer %s" \ % (tid, offer.hostname) task = mesos_pb2.TaskInfo() task.task_id.value = str(tid) task.slave_id.value = offer.slave_id.value task.name = "task %d" % tid executor.executor_id.value = str(tid) executor.command.value = os.path.abspath("./work3-executor") task.executor.MergeFrom(self.executor) cpus = task.resources.add() cpus.name = "cpus" cpus.type = mesos_pb2.Value.SCALAR cpus.scalar.value = TASK_CPUS mem = task.resources.add() mem.name = "mem" mem.type = mesos_pb2.Value.SCALAR mem.scalar.value = TASK_MEM return task def work4(self, offer): tasks = [] tid = self.tasksLaunched self.tasksLaunched += 1 print "Launching wc_task %d using offer %s" \ % (tid, offer.hostname) task = mesos_pb2.TaskInfo() task.task_id.value = str(tid) task.slave_id.value = offer.slave_id.value task.name = "task %d" % tid executor.executor_id.value = str(tid) executor.command.value = os.path.abspath("./work4-executor") task.executor.MergeFrom(self.executor) cpus = task.resources.add() cpus.name = "cpus" cpus.type = mesos_pb2.Value.SCALAR cpus.scalar.value = TASK_CPUS mem = task.resources.add() mem.name = "mem" mem.type = mesos_pb2.Value.SCALAR mem.scalar.value = TASK_MEM print "work2" return task def registered(self, driver, frameworkId, masterInfo): print "Registered with framework ID %s" % frameworkId.value self.start = datetime.datetime.now()
[jira] [Commented] (MESOS-5439) registerExecutor problem
[ https://issues.apache.org/jira/browse/MESOS-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296759#comment-15296759 ] Joseph Wu commented on MESOS-5439: -- A couple questions: * How many tasks are you launching at once? (i.e. from a single offer) And how many over a given time? * Are you using the default command executor? Or are you launching a custom executor? * What flags are you using to launch the agent? * What do the executor's stdout/stderr files (in the sandbox) say? There should be glog logs in there too. > registerExecutor problem > > > Key: MESOS-5439 > URL: https://issues.apache.org/jira/browse/MESOS-5439 > Project: Mesos > Issue Type: Bug > Components: c++ api, slave >Affects Versions: 0.27.0 >Reporter: kimjoohwan > > Currently, we are using Mesos 0.27.0. The master is build up with a Intel(R) > Core(TM) i5-3470 CPU @ 3.20GHz CPU and a 4GB RAM. The slave (Banana PI) is > build up with a Cortex -A7 Dual-Core CPU and a 1GB RAM. > By using the Mesos API, we have developed and completed the execution of the > framework which is based on python. > but, we found that it takes too much time between the messages, 'Forked child > with pid' and 'Got registration for executor' from the slave log. (5sec) > If you know how to deal with this problem, please let us know. > I0523 17:38:16.264289 1787 slave.cpp:5208] Launching executor default of > framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 with resources in work > directory > '/tmp/mesos/slaves/3fb86eea-96c4-4b07-aaa2-caf071275bdf-S2/frameworks/3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010/executors/default/runs/1c830c9a-4120-4ef0-af80-49a52d307539' > I0523 17:38:16.290601 1789 containerizer.cpp:616] Starting container > '1c830c9a-4120-4ef0-af80-49a52d307539' for executor 'default' of framework > '3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010' > I0523 17:38:16.293285 1787 slave.cpp:1626] Queuing task '0' for executor > 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 > I0523 17:38:16.297369 1787 slave.cpp:4233] Current disk usage 2.14%. Max > allowed age: 6.150293798159722days > I0523 17:38:16.504043 1789 launcher.cpp:132] Forked child with pid '1837' > for container '1c830c9a-4120-4ef0-af80-49a52d307539' > I0523 17:38:21.510535 1785 slave.cpp:2573] Got registration for executor > 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from > executor(1)@192.168.0.8:56508 > I0523 17:38:21.554608 1785 slave.cpp:1791] Sending queued task '0' to > executor 'default' of framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 at > executor(1)@192.168.0.8:56508 > I0523 17:38:21.594511 1789 slave.cpp:2932] Handling status update > TASK_RUNNING (UUID: cd04ec2a-0e68-460a-ad2e-e4f504f3b032) for task 0 of > framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from > executor(1)@192.168.0.8:56508 > I0523 17:38:21.600050 1789 slave.cpp:2932] Handling status update > TASK_FINISHED (UUID: 46e110c8-4078-4f98-ae30-30b3a1376034) for task 0 of > framework 3fb86eea-96c4-4b07-aaa2-caf071275bdf-0010 from > executor(1)@192.168.0.8:56508 -- This message was sent by Atlassian JIRA (v6.3.4#6332)