While digging into code and trying to figure out the code flow, i have
noticed something and i believe this could be the issue with my setup. As
my mesos agents have private ip and using firewalls public address as
--advertise_ip, executor (ExecutorInfo) receives host as my firewall
address. where as the executors host should be localhost.
my executor command name-value pair looks like below:
command {
environment {
variables {
name: "MARATHON_APP_VERSION"
value: "2016-09-09T06:15:47.986Z"
}
* variables {*
* name: "HOST"*
* value: "csfirewall.binghamton.edu <http://venom.cs.binghamton.edu/>"*
* }*
variables {
name: "MARATHON_APP_RESOURCE_CPUS"
value: "1.0"
}
variables {
name: "PORT_10000"
value: "8083"
}
variables {
name: "MESOS_TASK_ID"
value: "test-job.34bfa7a5-7760-11e6-aaea-0cc47a49cf86"
}
variables {
name: "PORT"
value: "8083"
}
variables {
name: "MARATHON_APP_RESOURCE_MEM"
value: "128.0"
}
variables {
name: "PORTS"
value: "8083"
}
variables {
name: "MARATHON_APP_RESOURCE_DISK"
value: "128.0"
}
variables {
name: "MARATHON_APP_LABELS"
value: ""
}
variables {
name: "MARATHON_APP_ID"
value: "/test-job"
}
variables {
name: "PORT0"
value: "8083"
}
}
value: "/home/pankaj/mesos-0.28.2/build/src/mesos-executor"
shell: true
}
ExecutorInfo is a java class and being used by JNI. Can any one please help
me figuring out how these executor info is getting built. I can change the
hostname value before executor trying to execute. I believe this will
resolve my job execution issue.
Thanks
Pankaj
On Thu, Sep 1, 2016 at 2:48 PM, Pankaj Saha <[email protected]> wrote:
> I am using my local system as a slave which is behind campus
> firewall. Firewall allows me to access only 8082 to 8090 ports for all
> incoming requests. My slave system is having a private ip address. So I am
> using a NAT conversion at the firewall which will convert all incoming
> requests to ports 8080-8090 ports on the firewall to my private ip machine.
>
> Firewall has a public ip: 128.x.y.z and mesos slave has a private ip:
> 10.11.12.13
>
> Here is the NAT rule: 128.x.y.z port 8082-8090 will change into
> 10.11.12.13 port 8082-8090
> I am using firewalls address as *--advertise_ip=* *128.x.y.z *address,
> which is registering *128.x.y.z:8082* as a slave. Now jobs are coming to
> my private ip machine through that NAT conversion but containers are
> failing and relaunching again and again.
>
> Mesos Master is running on 5050
> Mesos slave is running on 8082
> Mesos ports as resources are (8083-8084).
> Linux ephemeral port range is changed in slave machine to (8085-8090).
>
> I hope this explains my setup. I am planning to make changes if required
> in the source code to make this kind of set up working.
>
>
>
>
>
>
> On Thu, Sep 1, 2016 at 11:04 AM, haosdent <[email protected]> wrote:
>
>> If you use Linux, you could execute follow command on every Mesos Agent to
>> make the ephemeral port assigned during 5000~6000
>>
>> echo "5000 6000" > /proc/sys/net/ipv4/ip_local_port_range
>>
>> On Thu, Sep 1, 2016 at 3:12 PM, Vinod Kone <[email protected]> wrote:
>>
>> > AFAICT, your agent is listening on port 8082 and not the default 5051.
>> > ---------
>> > I0829 14:24:21.750063 2679 slave.cpp:193] Slave started on 1)@
>> > 128.226.116.69:8082
>> > ------------
>> >
>> > The fact that agent is receiving a task from the master means that the
>> > firewall on the agent allows incoming connections to 8082. So I'm
>> surprised
>> > that a local connection from the executor to the agent is being denied.
>> > What exactly are your firewall rules on the agent?
>> >
>> > Also, can you share the stderr/stdout of an example executor?
>> >
>> >
>> > On Wed, Aug 31, 2016 at 6:18 PM, Pankaj Saha <[email protected]>
>> > wrote:
>> >
>> > > I think the executor wants to get registered by communicating with
>> > > mesos master and it fails due to network restriction.
>> > > How can I change the /tmp/ path? I have mentioned /var/lib/mesos as
>> my
>> > > work_dir.
>> > >
>> > >
>> > > *I am explaining my setup here:*
>> > > I have a Mesos setup where master and slave both are running on the
>> same
>> > > network of my university campus. Mesos agent node is situated under a
>> > > firewall and only port: 5000 to port:6000 are open for incoming
>> traffic
>> > > whereas Mesos master has no such restrictions. I am running master
>> > service
>> > > on master:5050 and agent is running on agent:5051 as default.
>> > >
>> > > I can see agent is communicating correctly to master and offering the
>> > > available resources. I have mentioned the available ports for agents
>> are
>> > > ports:[5001-6000] in *src/slave/constants.cpp* file so that framework
>> can
>> > > communicate only through those ports which are open for my agent
>> system
>> > > behind the firewall.
>> > >
>> > > Now when I am launching jobs through Mesosphere marathon framework, I
>> can
>> > > see all jobs are connected to mesos-agent through those mentioned port
>> > > ranges[5001-6000]. But my jobs are not getting submitted. So I started
>> > > debugging and realised that when launching jobs mesos slaves create
>> and
>> > > launch an executor (*/erc/executor/executor.cpp*) which communicates
>> to
>> > the
>> > > mesos master through a random port. Which is outside my available
>> range
>> > of
>> > > 5000-6000 open ports. Now as through those ports my agent machine can
>> not
>> > > take any requests so executor is getting timed out and restarting the
>> > > executor again and again after every 1 min of time limit.
>> > >
>> > > I could not find out where exactly that random port is assigned. Is
>> there
>> > > any socket connection that we can change to get executor connection
>> > happen
>> > > on desired range of ports? Please let me know if my understanding is
>> > > correct and how can I change those ports for executor registration.
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Aug 31, 2016 at 3:09 AM, haosdent <[email protected]> wrote:
>> > >
>> > > > >I0829 14:27:38.322805 2700 slave.cpp:4307] *Terminating executor
>> > > > ''test.1fb85a35-6e16-11e6-bec9-c27afc834a0c' of framework
>> > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000' because it did not
>> register
>> > > > within 1mins
>> > > >
>> > > > This log looks wired. Could you find anything in the stdout/stderr
>> of
>> > the
>> > > > executor. For the executor 'test.1fb85a35-6e16-11e6-bec9-
>> c27afc834a0c'
>> > > > above, it should be under the folder '/tmp/mesos/slaves/d6f0e3e2-
>> > > > d144-4275-9d38-82327408622b-S12/frameworks/c796100f-9ecb-
>> > > > 46fa-90a2-72ad649c5dd3-0000/executors/test.1fb85a35-6e16-
>> > > > 11e6-bec9-c27afc834a0c/runs/dff399f0-beb1-4c49-bd8e-c19621de2f71/'
>> > > >
>> > > > Apart from that, run mesos under '/tmp' is not recommended.
>> > > >
>> > > > On Tue, Aug 30, 2016 at 2:32 AM, Pankaj Saha <[email protected]
>> >
>> > > > wrote:
>> > > >
>> > > > > here is the log:
>> > > > >
>> > > > >
>> > > > >
>> > > > > I0829 14:24:21.727960 2679 main.cpp:223] Build: 2016-08-28
>> 13:39:46
>> > by
>> > > > > root
>> > > > > I0829 14:24:21.728159 2679 main.cpp:225] Version: 0.28.2
>> > > > > I0829 14:24:21.733256 2679 containerizer.cpp:149] Using
>> isolation:
>> > > > > posix/cpu,posix/mem,filesystem/posix
>> > > > > I0829 14:24:21.738895 2679 linux_launcher.cpp:101] Using
>> > > > > /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux
>> > launcher
>> > > > > I0829 14:24:21.748019 2679 main.cpp:328] Starting Mesos slave
>> > > > > I0829 14:24:21.750063 2679 slave.cpp:193] Slave started on 1)@
>> > > > > 128.226.116.69:8082
>> > > > > I0829 14:24:21.750114 2679 slave.cpp:194] Flags at startup:
>> > > > > --advertise_ip="128.226.116.69" --appc_simple_discovery_uri_
>> > > > > prefix="http://"
>> > > > > --appc_store_dir="/tmp/mesos/store/appc"
>> --authenticatee="crammd5"
>> > > > > --cgroups_cpu_enable_pids_and_tids_count="false"
>> > > > > --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
>> > > > > --cgroups_limit_swap="false" --cgroups_root="mesos"
>> > > > > --container_disk_watch_interval="15secs" --containerizers="mesos"
>> > > > > --default_role="*" --disk_watch_interval="1mins" --docker="docker"
>> > > > > --docker_kill_orphans="true" --docker_registry="https://
>> > > > > registry-1.docker.io"
>> > > > > --docker_remove_delay="6hrs" --docker_socket="/var/run/dock
>> er.sock"
>> > > > > --docker_stop_timeout="0ns" --docker_store_dir="/tmp/
>> > > mesos/store/docker"
>> > > > > --enforce_container_disk_quota="false"
>> > > > > --executor_registration_timeout="1mins"
>> > > > > --executor_shutdown_grace_period="5secs"
>> > > > > --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
>> > > > > --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
>> > > > > --hadoop_home="" --help="false" --hostname_lookup="true"
>> > > > > --image_provisioner_backend="copy" --initialize_driver_logging="
>> > true"
>> > > > > --isolation="posix/cpu,posix/mem"
>> > > > > --launcher_dir="/home/pankaj/mesos-0.28.2/build/src"
>> > --logbufsecs="0"
>> > > > > --logging_level="INFO" --master="129.114.110.143:5050"
>> > > > > --oversubscribed_resources_interval="15secs"
>> > --perf_duration="10secs"
>> > > > > --perf_interval="1mins" --port="8082"
>> --qos_correction_interval_min=
>> > > > "0ns"
>> > > > > --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
>> > > > > --registration_backoff_factor="1secs"
>> --revocable_cpu_low_priority="
>> > > > true"
>> > > > > --sandbox_directory="/mnt/mesos/sandbox" --strict="true"
>> > > > > --switch_user="true" --systemd_enable_support="true"
>> > > > > --systemd_runtime_directory="/run/systemd/system"
>> --version="false"
>> > > > > --work_dir="/tmp/mesos"
>> > > > > I0829 14:24:21.753572 2679 slave.cpp:464] Slave resources:
>> > cpus(*):2;
>> > > > > mem(*):2855; disk(*):84691; ports(*):[8081-8081]
>> > > > > I0829 14:24:21.753706 2679 slave.cpp:472] Slave attributes: [ ]
>> > > > > I0829 14:24:21.753762 2679 slave.cpp:477] Slave hostname:
>> > > > > venom.cs.binghamton.edu
>> > > > > I0829 14:24:21.770992 2696 state.cpp:58] Recovering state from
>> > > > > '/tmp/mesos/meta'
>> > > > > I0829 14:24:21.771304 2696 state.cpp:698] No checkpointed
>> resources
>> > > > found
>> > > > > at '/tmp/mesos/meta/resources/resources.info'
>> > > > > I0829 14:24:21.771644 2696 state.cpp:101] Failed to find the
>> latest
>> > > > slave
>> > > > > from '/tmp/mesos/meta'
>> > > > > I0829 14:24:21.772583 2696 status_update_manager.cpp:200]
>> Recovering
>> > > > > status update manager
>> > > > > I0829 14:24:21.773082 2698 containerizer.cpp:407] Recovering
>> > > > containerizer
>> > > > > I0829 14:24:21.777489 2702 provisioner.cpp:245] Provisioner
>> recovery
>> > > > > complete
>> > > > > I0829 14:24:21.778149 2699 slave.cpp:4550] Finished recovery
>> > > > > I0829 14:24:21.779564 2699 slave.cpp:796] New master detected at
>> > > > > [email protected]:5050
>> > > > > I0829 14:24:21.779742 2697 status_update_manager.cpp:174] Pausing
>> > > > sending
>> > > > > status updates
>> > > > > I0829 14:24:21.780607 2699 slave.cpp:821] No credentials
>> provided.
>> > > > > Attempting to register without authentication
>> > > > > I0829 14:24:21.781394 2699 slave.cpp:832] Detecting new master
>> > > > > I0829 14:24:22.698812 2702 slave.cpp:971] Registered with master
>> > > > > [email protected]:5050; given slave ID
>> > > > > d6f0e3e2-d144-4275-9d38-82327408622b-S12
>> > > > > I0829 14:24:22.699113 2698 status_update_manager.cpp:181]
>> Resuming
>> > > > sending
>> > > > > status updates
>> > > > > I0829 14:24:22.700258 2702 slave.cpp:1030] Forwarding total
>> > > > oversubscribed
>> > > > > resources
>> > > > > I0829 14:24:43.638958 2695 http.cpp:190] HTTP GET for
>> > /slave(1)/state
>> > > > from
>> > > > > 128.226.119.78:59261 with User-Agent='Mozilla/5.0 (X11; Linux
>> > x86_64)
>> > > > > AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86
>> > > Safari/537.36'
>> > > > > I0829 14:25:21.764268 2702 slave.cpp:4359] Current disk usage
>> 9.67%.
>> > > Max
>> > > > > allowed age: 5.622868987169502days
>> > > > > I0829 14:26:21.778849 2695 slave.cpp:4359] Current disk usage
>> 9.67%.
>> > > Max
>> > > > > allowed age: 5.622860462326585days
>> > > > > I0829 14:26:38.271085 2698 slave.cpp:1361] Got assigned task
>> > > > > test.1fb85a35-6e16-11e6-bec9-c27afc834a0c for framework
>> > > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
>> > > > > I0829 14:26:38.311063 2698 slave.cpp:1480] Launching task
>> > > > > test.1fb85a35-6e16-11e6-bec9-c27afc834a0c for framework
>> > > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
>> > > > > I0829 14:26:38.314755 2698 paths.cpp:528] Trying to chown
>> > > > > '/tmp/mesos/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-
>> > > > > S12/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000/
>> > > > > executors/test.1fb85a35-6e16-11e6-bec9-c27afc834a0c/runs/
>> > > > > dff399f0-beb1-4c49-bd8e-c19621de2f71'
>> > > > > to user 'root'
>> > > > > I0829 14:26:38.320300 2698 slave.cpp:5352] Launching executor
>> > > > > test.1fb85a35-6e16-11e6-bec9-c27afc834a0c of framework
>> > > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000 with resources
>> > cpus(*):0.1;
>> > > > > mem(*):32 in work directory
>> > > > > '/tmp/mesos/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-
>> > > > > S12/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000/
>> > > > > executors/test.1fb85a35-6e16-11e6-bec9-c27afc834a0c/runs/
>> > > > > dff399f0-beb1-4c49-bd8e-c19621de2f71'
>> > > > > I0829 14:26:38.321523 2702 containerizer.cpp:666] Starting
>> container
>> > > > > 'dff399f0-beb1-4c49-bd8e-c19621de2f71' for executor
>> > > > > 'test.1fb85a35-6e16-11e6-bec9-c27afc834a0c' of framework
>> > > > > 'c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000'
>> > > > > I0829 14:26:38.322588 2698 slave.cpp:1698] Queuing task
>> > > > > 'test.1fb85a35-6e16-11e6-bec9-c27afc834a0c' for executor
>> > > > > 'test.1fb85a35-6e16-11e6-bec9-c27afc834a0c' of framework
>> > > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
>> > > > > I0829 14:26:38.358906 2702 linux_launcher.cpp:304] Cloning child
>> > > process
>> > > > > with flags =
>> > > > > I0829 14:26:38.366492 2702 containerizer.cpp:1179] Checkpointing
>> > > > > executor's forked pid 2758 to
>> > > > > '/tmp/mesos/meta/slaves/d6f0e3e2-d144-4275-9d38-
>> > > > > 82327408622b-S12/frameworks/c796100f-9ecb-46fa-90a2-
>> > > > > 72ad649c5dd3-0000/executors/test.1fb85a35-6e16-11e6-bec9-
>> > > > > c27afc834a0c/runs/dff399f0-beb1-4c49-bd8e-c19621de2f71/
>> > > pids/forked.pid'
>> > > > > I0829 14:27:21.779755 2701 slave.cpp:4359] Current disk usage
>> 9.67%.
>> > > Max
>> > > > > allowed age: 5.622850415190289days
>> > > > > I0829 14:27:38.322805 2700 slave.cpp:4307]
>> > > > > *Terminating executor ''test.1fb85a35-6e16-11e6-bec9
>> -c27afc834a0c'
>> > of
>> > > > > framework c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000' because it
>> did
>> > > not
>> > > > > register within 1minsI0829 14:27:38.323226 2700
>> > containerizer.cpp:1453]
>> > > > > Destroying container 'dff399f0-beb1-4c49-bd8e-c19621de2f71'*
>> > > > > I0829 14:27:38.329186 2702 cgroups.cpp:2427] Freezing cgroup
>> > > > > /sys/fs/cgroup/freezer/mesos/dff399f0-beb1-4c49-bd8e-c19621de2f71
>> > > > > I0829 14:27:38.331509 2699 cgroups.cpp:1409] Successfully froze
>> > cgroup
>> > > > > /sys/fs/cgroup/freezer/mesos/dff399f0-beb1-4c49-bd8e-c19621de2f71
>> > > after
>> > > > > 2.19392ms
>> > > > > I0829 14:27:38.334520 2698 cgroups.cpp:2445] Thawing cgroup
>> > > > > /sys/fs/cgroup/freezer/mesos/dff399f0-beb1-4c49-bd8e-c19621de2f71
>> > > > > I0829 14:27:38.337821 2698 cgroups.cpp:1438] Successfullly thawed
>> > > cgroup
>> > > > > /sys/fs/cgroup/freezer/mesos/dff399f0-beb1-4c49-bd8e-c19621de2f71
>> > > after
>> > > > > 3.194112ms
>> > > > > I0829 14:27:38.435214 2696 containerizer.cpp:1689] Executor for
>> > > > container
>> > > > > 'dff399f0-beb1-4c49-bd8e-c19621de2f71' has exited
>> > > > > I0829 14:27:38.441556 2695 provisioner.cpp:306] Ignoring destroy
>> > > request
>> > > > > for unknown container dff399f0-beb1-4c49-bd8e-c19621de2f71
>> > > > > I0829 14:27:38.442186 2695 slave.cpp:3871] Executor
>> > > > > 'test.1fb85a35-6e16-11e6-bec9-c27afc834a0c' of framework
>> > > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000 terminated with signal
>> > > Killed
>> > > > > I0829 14:27:38.445689 2695 slave.cpp:3012] Handling status update
>> > > > > TASK_FAILED (UUID: 154a67ea-f6d5-4e9c-ad3d-9b09161ba34d) for task
>> > > > > test.1fb85a35-6e16-11e6-bec9-c27afc834a0c of framework
>> > > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000 from @0.0.0.0:0
>> > > > > W0829 14:27:38.447599 2702 containerizer.cpp:1295] Ignoring
>> update
>> > for
>> > > > > unknown container: dff399f0-beb1-4c49-bd8e-c19621de2f71
>> > > > > I0829 14:27:38.448391 2702 status_update_manager.cpp:320]
>> Received
>> > > > status
>> > > > > update TASK_FAILED (UUID: 154a67ea-f6d5-4e9c-ad3d-9b09161ba34d)
>> for
>> > > task
>> > > > > test.1fb85a35-6e16-11e6-bec9-c27afc834a0c of framework
>> > > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
>> > > > > I0829 14:27:38.449525 2702 status_update_manager.cpp:824]
>> > > Checkpointing
>> > > > > UPDATE for status update TASK_FAILED (UUID:
>> > > > > 154a67ea-f6d5-4e9c-ad3d-9b09161ba34d) for task
>> > > > > test.1fb85a35-6e16-11e6-bec9-c27afc834a0c of framework
>> > > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
>> > > > > I0829 14:27:38.523027 2696 slave.cpp:3410] Forwarding the update
>> > > > > TASK_FAILED (UUID: 154a67ea-f6d5-4e9c-ad3d-9b09161ba34d) for task
>> > > > > test.1fb85a35-6e16-11e6-bec9-c27afc834a0c of framework
>> > > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000 to
>> > > [email protected]:5050
>> > > > > I0829 14:27:38.627722 2698 status_update_manager.cpp:392]
>> Received
>> > > > status
>> > > > > update acknowledgement (UUID: 154a67ea-f6d5-4e9c-ad3d-9b0916
>> 1ba34d)
>> > > for
>> > > > > task test.1fb85a35-6e16-11e6-bec9-c27afc834a0c of framework
>> > > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
>> > > > > I0829 14:27:38.627943 2698 status_update_manager.cpp:824]
>> > > Checkpointing
>> > > > > ACK for status update TASK_FAILED (UUID:
>> > > > > 154a67ea-f6d5-4e9c-ad3d-9b09161ba34d) for task
>> > > > > test.1fb85a35-6e16-11e6-bec9-c27afc834a0c of framework
>> > > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
>> > > > > I0829 14:27:38.698822 2701 slave.cpp:3975] Cleaning up executor
>> > > > > 'test.1fb85a35-6e16-11e6-bec9-c27afc834a0c' of framework
>> > > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
>> > > > > I0829 14:27:38.699582 2698 gc.cpp:55] Scheduling
>> > > > > '/tmp/mesos/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-
>> > > > > S12/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000/
>> > > > > executors/test.1fb85a35-6e16-11e6-bec9-c27afc834a0c/runs/
>> > > > > dff399f0-beb1-4c49-bd8e-c19621de2f71'
>> > > > > for gc 6.99999190486222days in the future
>> > > > > I0829 14:27:38.700202 2698 gc.cpp:55] Scheduling
>> > > > > '/tmp/mesos/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-
>> > > > > S12/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000/
>> > > > > executors/test.1fb85a35-6e16-11e6-bec9-c27afc834a0c'
>> > > > > for gc 6.99999190029037days in the future
>> > > > > I0829 14:27:38.700382 2701 slave.cpp:4063] Cleaning up framework
>> > > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
>> > > > > I0829 14:27:38.700443 2698 gc.cpp:55] Scheduling
>> > > > > '/tmp/mesos/meta/slaves/d6f0e3e2-d144-4275-9d38-
>> > > > > 82327408622b-S12/frameworks/c796100f-9ecb-46fa-90a2-
>> > > > > 72ad649c5dd3-0000/executors/test.1fb85a35-6e16-11e6-bec9-
>> > > > > c27afc834a0c/runs/dff399f0-beb1-4c49-bd8e-c19621de2f71'
>> > > > > for gc 6.99999189796148days in the future
>> > > > > I0829 14:27:38.700649 2698 gc.cpp:55] Scheduling
>> > > > > '/tmp/mesos/meta/slaves/d6f0e3e2-d144-4275-9d38-
>> > > > > 82327408622b-S12/frameworks/c796100f-9ecb-46fa-90a2-
>> > > > > 72ad649c5dd3-0000/executors/test.1fb85a35-6e16-11e6-bec9-
>> > c27afc834a0c'
>> > > > > for gc 6.99999189622815days in the future
>> > > > > I0829 14:27:38.700845 2698 gc.cpp:55] Scheduling
>> > > > > '/tmp/mesos/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-
>> > > > > S12/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000'
>> > > > > for gc 6.99999189143704days in the future
>> > > > > I0829 14:27:38.701015 2701 status_update_manager.cpp:282] Closing
>> > > status
>> > > > > update streams for framework c796100f-9ecb-46fa-90a2-
>> > 72ad649c5dd3-0000
>> > > > > I0829 14:27:38.701161 2698 gc.cpp:55] Scheduling
>> > > > > '/tmp/mesos/meta/slaves/d6f0e3e2-d144-4275-9d38-
>> > > > > 82327408622b-S12/frameworks/c796100f-9ecb-46fa-90a2-
>> > 72ad649c5dd3-0000'
>> > > > > for gc 6.9999918900237days in the future
>> > > > > I0829 14:27:39.651463 2697 slave.cpp:1361] Got assigned task
>> > > > > test.445696e6-6e16-11e6-bec9-c27afc834a0c for framework
>> > > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
>> > > > > I0829 14:27:39.655815 2696 gc.cpp:83] Unscheduling
>> > > > > '/tmp/mesos/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-
>> > > > > S12/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000'
>> > > > > from gc
>> > > > > I0829 14:27:39.656445 2696 gc.cpp:83] Unscheduling
>> > > > > '/tmp/mesos/meta/slaves/d6f0e3e2-d144-4275-9d38-
>> > > > > 82327408622b-S12/frameworks/c796100f-9ecb-46fa-90a2-
>> > 72ad649c5dd3-0000'
>> > > > > from gc
>> > > > > I0829 14:27:39.656855 2702 slave.cpp:1480] Launching task
>> > > > > test.445696e6-6e16-11e6-bec9-c27afc834a0c for framework
>> > > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
>> > > > > I0829 14:27:39.660585 2702 paths.cpp:528] Trying to chown
>> > > > > '/tmp/mesos/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-
>> > > > > S12/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000/
>> > > > > executors/test.445696e6-6e16-11e6-bec9-c27afc834a0c/runs/
>> > > > > ea676570-0a2a-49c3-a75c-14e045eb842b'
>> > > > > to user 'root'
>> > > > > I0829 14:27:39.666008 2702 slave.cpp:5352] Launching executor
>> > > > > test.445696e6-6e16-11e6-bec9-c27afc834a0c of framework
>> > > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000 with resources
>> > cpus(*):0.1;
>> > > > > mem(*):32 in work directory
>> > > > > '/tmp/mesos/slaves/d6f0e3e2-d144-4275-9d38-82327408622b-
>> > > > > S12/frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000/
>> > > > > executors/test.445696e6-6e16-11e6-bec9-c27afc834a0c/runs/
>> > > > > ea676570-0a2a-49c3-a75c-14e045eb842b'
>> > > > > I0829 14:27:39.667603 2702 slave.cpp:1698] Queuing task
>> > > > > 'test.445696e6-6e16-11e6-bec9-c27afc834a0c' for executor
>> > > > > 'test.445696e6-6e16-11e6-bec9-c27afc834a0c' of framework
>> > > > > c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
>> > > > > I0829 14:27:39.668207 2702 containerizer.cpp:666] Starting
>> container
>> > > > > 'ea676570-0a2a-49c3-a75c-14e045eb842b' for executor
>> > > > > 'test.445696e6-6e16-11e6-bec9-c27afc834a0c' of framework
>> > > > > 'c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000'
>> > > > > I0829 14:27:39.678665 2702 linux_launcher.cpp:304] Cloning child
>> > > process
>> > > > > with flags =
>> > > > > I0829 14:27:39.681824 2702 containerizer.cpp:1179] Checkpointing
>> > > > > executor's forked pid 2799 to
>> > > > > '/tmp/mesos/meta/slaves/d6f0e3e2-d144-4275-9d38-
>> > > > > 82327408622b-S12/frameworks/c796100f-9ecb-46fa-90a2-
>> > > > > 72ad649c5dd3-0000/executors/test.445696e6-6e16-11e6-bec9-
>> > > > > c27afc834a0c/runs/ea676570-0a2a-49c3-a75c-14e045eb842b/
>> > > pids/forked.pid'
>> > > > >
>> > > > > Thanks
>> > > > > Pankaj
>> > > > >
>> > > > >
>> > > > > On Mon, Aug 29, 2016 at 7:25 AM, haosdent <[email protected]>
>> > wrote:
>> > > > >
>> > > > > > Hi, @Pankaj, Could you provide logs during " the job is getting
>> > > > restarted
>> > > > > > and a new container is created with a new process id. ". The
>> logs
>> > you
>> > > > > > provided looks normal.
>> > > > > >
>> > > > > > On Mon, Aug 29, 2016 at 5:26 AM, Pankaj Saha <
>> > [email protected]>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hi
>> > > > > > > I am facing an issue with a launched jobs into my mesos
>> agents. I
>> > > am
>> > > > > > trying
>> > > > > > > to launch a job through marathon framework and job is staying
>> in
>> > > > > stagged
>> > > > > > > state and not running.
>> > > > > > > I could see the log message at the agent console as below:
>> > > > > > >
>> > > > > > > Scheduling
>> > > > > > > '/var/lib/mesos-8082/meta/slaves/d6f0e3e2-d144-4275-
>> > > > > > 9d38-82327408622b-S8/
>> > > > > > > frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000'
>> > > > > > > for gc 6.99999884239407days in the future
>> > > > > > > I0828 16:20:36.053483 28512 slave.cpp:1361] *Got assigned task
>> > > > > > > test-crixus*.eb66a42b-6d5c-11e6-bec9-c27afc834a0c
>> > > > > > > for framework c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
>> > > > > > > I0828 16:20:36.056224 28510 gc.cpp:83] Unscheduling
>> > > > > > > '/var/lib/mesos-8082/slaves/d6f0e3e2-d144-4275-9d38-
>> > > > > > > 82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-
>> > > > 72ad649c5dd3-0000'
>> > > > > > > from gc
>> > > > > > > I0828 16:20:36.056715 28510 gc.cpp:83] Unscheduling
>> > > > > > > '/var/lib/mesos-8082/meta/slaves/d6f0e3e2-d144-4275-
>> > > > > > 9d38-82327408622b-S8/
>> > > > > > > frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000'
>> > > > > > > from gc
>> > > > > > > I0828 16:20:36.057231 28509 slave.cpp:1480] *Launching task
>> > > > > > > test-crixus*.eb66a42b-6d5c-11e6-bec9-c27afc834a0c
>> > > > > > > for framework c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
>> > > > > > > I0828 16:20:36.058661 28509 paths.cpp:528]* Trying to chown*
>> > > > > > > '/var/lib/mesos-8082/slaves/d6f0e3e2-d144-4275-9d38-
>> > > > > > > 82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-
>> > > > > > > 72ad649c5dd3-0000/executors/test-crixus.eb66a42b-6d5c-
>> > > > > > > 11e6-bec9-c27afc834a0c/runs/99620406-87b5-406c-a88b-
>> > 13adb145c12d'
>> > > > > > > to user 'root'
>> > > > > > > I0828 16:20:36.067807 28509 slave.cpp:5352]* Launching
>> executor
>> > > > > > > test-crixus*.eb66a42b-6d5c-11e6-bec9-c27afc834a0c
>> > > > > > > of framework c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000 with
>> > > > resources
>> > > > > > > cpus(*):0.1; mem(*):32 in work directory
>> > > > > > > '/var/lib/mesos-8082/slaves/d6f0e3e2-d144-4275-9d38-
>> > > > > > > 82327408622b-S8/frameworks/c796100f-9ecb-46fa-90a2-
>> > > > > > > 72ad649c5dd3-0000/executors/test-crixus.eb66a42b-6d5c-
>> > > > > > > 11e6-bec9-c27afc834a0c/runs/99620406-87b5-406c-a88b-
>> > 13adb145c12d'
>> > > > > > > I0828 16:20:36.069314 28509 slave.cpp:1698] *Queuing task
>> > > > > > > 'test-crixus.*eb66a42b-6d5c-11e6-bec9-c27afc834a0c'
>> > > > > > > for executor 'test-crixus.eb66a42b-6d5c-11e
>> 6-bec9-c27afc834a0c'
>> > of
>> > > > > > > framework c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000
>> > > > > > > I0828 16:20:36.069902 28509 containerizer.cpp:666] *Starting
>> > > > container*
>> > > > > > > '99620406-87b5-406c-a88b-13adb145c12d' for executor
>> > > > > > > 'test-crixus.eb66a42b-6d5c-11e6-bec9-c27afc834a0c' of
>> framework
>> > > > > > > 'c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000'
>> > > > > > > I0828 16:20:36.080713 28509 linux_launcher.cpp:304] *Cloning
>> > child
>> > > > > > process*
>> > > > > > > with flags =
>> > > > > > > I0828 16:20:36.084738 28509 containerizer.cpp:1179]
>> > *Checkpointing
>> > > > > > > executor's forked pid 29629* to
>> > > > > > > '/var/lib/mesos-8082/meta/slaves/d6f0e3e2-d144-4275-
>> > > > > > 9d38-82327408622b-S8/
>> > > > > > > frameworks/c796100f-9ecb-46fa-90a2-72ad649c5dd3-0000/
>> > > > > > > executors/test-crixus.eb66a42b-6d5c-11e6-bec9-
>> > > > > > c27afc834a0c/runs/99620406-
>> > > > > > > 87b5-406c-a88b-13adb145c12d/pids/forked.pid'
>> > > > > > >
>> > > > > > >
>> > > > > > > But after that, the job is getting restarted and a new
>> container
>> > is
>> > > > > > created
>> > > > > > > with a new process id. It happening infinitely which is
>> keeping
>> > the
>> > > > job
>> > > > > > in
>> > > > > > > stagged state to mesos-master.
>> > > > > > >
>> > > > > > > This job is nothing but a simle echo "hello world" kind of
>> shell
>> > > > > command.
>> > > > > > > Can anyone please point out where its failing or I am doing
>> > wrong.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > Thanks
>> > > > > > > Pankaj
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Best Regards,
>> > > > > > Haosdent Huang
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Best Regards,
>> > > > Haosdent Huang
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>