[jira] [Created] (MESOS-4645) Mesos agent shutdown on healtcheck timeout rather than lost and recovered

Cody Maloney (JIRA) Wed, 10 Feb 2016 17:03:56 -0800

Cody Maloney created MESOS-4645:
-----------------------------------

             Summary: Mesos agent shutdown on healtcheck timeout rather than 
lost and recovered
                 Key: MESOS-4645
                 URL: https://issues.apache.org/jira/browse/MESOS-4645
             Project: Mesos
          Issue Type: Bug
    Affects Versions: 0.27.1
            Reporter: Cody Maloney



I expected slaves to have to be gone the re-registration timeout before they'd 
be lost to the cluster, not fail 5 healtchecks (Failing the healthchecks 
indicates there is a network partition, not that the agent is gone for good and 
will never come back).

Is there some flag I'm missing here which I should be setting?

>From my perspective I expect frameworks to not get offers for resources on 
>agents which haven't been contacted recently (The framework wouldn't be able 
>to launch anything on the agent). Once the re-registration period times out 
>the slave would be assumed completely lost and the tasks assumed terminated / 
>able to be re-launched if desired. If an agent recovers between the 
>healthcheck timeout and re-registration timeout, it should be able to re-join 
>the cluster with its running tasks kept running.

Note: Some log lines have their start or tail truncated. Critical stuff should 
all be there

Master flags
{noformat}
Feb 11 00:22:19 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:22:19.690507  1362 master.cpp:369] Flags at startup: 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="false" --authenticate_slaves="false" --authenticators="crammd5" 
--authorizers="local" --cluster="cody-cm52sd-2" --framework_sorter="drf" 
--help="false" --hostname_lookup="false" --initialize_driver_logging="true" 
--ip_discovery_command="/opt/mesosphere/bin/detect_ip" 
--log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" 
--logging_level="INFO" --max_slave_ping_timeouts="5" --port="5050" 
--quiet="false" --quorum="1" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="5secs" --registry_strict="false" 
--roles="slave_public" --root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/opt/mesosphere/packages/mesos--4dd59ec6bde2052f6f2a0a0da415b6c92c3c418a/share/mesos/webui"
 --weights="slave_public=1" --work_dir="/var/lib/mesos/master" 
--zk="zk://127.0.0.1:2181/mesos" --zk_session_timeout="10secs"
{noformat}

Slave flags
{noformat}
Feb 11 00:34:13 ip-10-0-0-52.us-west-2.compute.internal mesos-slave[3914]: 
I0211 00:34:13.334395  3914 slave.cpp:192] Flags at startup: 
--appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="docker,mesos" --default_role="*" 
--disk_watch_interval="1mins" --docker="docker" 
--docker_auth_server="auth.docker.io" --docker_auth_server_port="443" 
--docker_kill_orphans="true" 
--docker_local_archives_dir="/tmp/mesos/images/docker" --docker_puller="local" 
--docker_puller_timeout="60" --docker_registry="registry-1.docker.io" 
--docker_registry_port="443" --docker_remove_delay="1hrs" 
--docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" 
--docker_store_dir="/tmp/mesos/store/docker" 
--enforce_container_disk_quota="false" 
--executor_environment_variables="{"LD_LIBRARY_PATH":"\/opt\/mesosphere\/lib","PATH":"\/usr\/bin:\/bin","SASL_PATH":"\/opt\/mesosphere\/lib\/sasl2","SHELL":"\/usr\/bin\/bash"}"
 --executor_registration_timeout="5mins" 
--executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" 
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="2days" 
--gc_disk_headroom="0.1" --hadoop_home="" --help="false" 
--hostname_lookup="false" --image_provisioner_backend="copy" 
--initialize_driver_logging="true" 
--ip_discovery_command="/opt/mesosphere/bin/detect_ip" 
--isolation="cgroups/cpu,cgroups/mem" 
--launcher_dir="/opt/mesosphere/packages/mesos--4dd59ec6bde2052f6f2a0a0da415b6c92c3c418a/libexec/mesos"
 --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" 
--master="zk://leader.mesos:2181/mesos" 
--oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
--quiet="false" --recover="reconnect" --recovery_timeout="15mins" 
--registration_backoff_factor="1secs" 
--resources="ports:[1025-2180,2182-3887,3889-5049,5052-8079,8082-8180,8182-32000]"
 --re
Feb 11 00:34:13 ip-10-0-0-52.us-west-2.compute.internal mesos-slave[3914]: 
vocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" 
--slave_subsystems="cpu,memory" --strict="true" --switch_user="true" 
--systemd_runtime_directory="/run/systemd/system" --version="false" 
--work_dir="/var/lib/mesos/slave"
{noformat}

h2. Restarting the slave0


{noformat}
Feb 11 00:32:44 ip-10-0-0-52.us-west-2.compute.internal mesos-slave[3261]: 
W0211 00:32:40.981289  3261 logging.cpp:81] RAW: Received signal SIGTERM from 
process 1 of user 0; exiting
Feb 11 00:32:44 ip-10-0-0-52.us-west-2.compute.internal systemd[1]: Stopping 
Mesos Slave...
Feb 11 00:32:44 ip-10-0-0-52.us-west-2.compute.internal systemd[1]: Stopped 
Mesos Slave.
Feb 11 00:34:02 ip-10-0-0-52.us-west-2.compute.internal systemd[1]: Starting 
Mesos Slave...
Feb 11 00:34:02 ip-10-0-0-52.us-west-2.compute.internal ping[3534]: PING 
leader.mesos (10.0.4.187) 56(84) bytes of data.
Feb 11 00:34:02 ip-10-0-0-52.us-west-2.compute.internal ping[3534]: 64 bytes 
from ip-10-0-4-187.us-west-2.compute.internal (10.0.4.187): icmp_seq=1 ttl=64 
time=0.314 ms
Feb 11 00:34:02 ip-10-0-0-52.us-west-2.compute.internal ping[3534]: --- 
leader.mesos ping statistics ---
Feb 11 00:34:02 ip-10-0-0-52.us-west-2.compute.internal ping[3534]: 1 packets 
transmitted, 1 received, 0% packet loss, time 0ms
Feb 11 00:34:02 ip-10-0-0-52.us-west-2.compute.internal ping[3534]: rtt 
min/avg/max/mdev = 0.314/0.314/0.314/0.000 ms
Feb 11 00:34:02 ip-10-0-0-52.us-west-2.compute.internal systemd[1]: Started 
Mesos Slave.
Feb 11 00:34:02 ip-10-0-0-52.us-west-2.compute.internal mesos-slave[3536]: 
I0211 00:34:02.256242  3536 logging.cpp:172] INFO level logging started!
{noformat}

h2. The slave detects the new master, gets shutdown for re-registering after 
removal
{noformat}
Feb 11 00:34:04 ip-10-0-0-52.us-west-2.compute.internal mesos-slave[3536]: 
I0211 00:34:04.705356  3546 slave.cpp:729] New master detected at 
master@10.0.4.187:5050
Feb 11 00:34:04 ip-10-0-0-52.us-west-2.compute.internal mesos-slave[3536]: 
I0211 00:34:04.705366  3539 status_update_manager.cpp:176] Pausing sending 
status updates
Feb 11 00:34:04 ip-10-0-0-52.us-west-2.compute.internal mesos-slave[3536]: 
I0211 00:34:04.705550  3546 slave.cpp:754] No credentials provided. Attempting 
to register without authentication
Feb 11 00:34:04 ip-10-0-0-52.us-west-2.compute.internal mesos-slave[3536]: 
I0211 00:34:04.705597  3546 slave.cpp:765] Detecting new master
Feb 11 00:34:05 ip-10-0-0-52.us-west-2.compute.internal mesos-slave[3536]: 
I0211 00:34:05.624832  3544 slave.cpp:643] Slave asked to shut down by 
master@10.0.4.187:5050 because 'Slave attempted to re-register after removal'
Feb 11 00:34:05 ip-10-0-0-52.us-west-2.compute.internal mesos-slave[3536]: 
I0211 00:34:05.624908  3544 slave.cpp:2009] Asked to shut down framework 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-0000 by master@10.0.4.187:5050
Feb 11 00:34:05 ip-10-0-0-52.us-west-2.compute.internal mesos-slave[3536]: 
I0211 00:34:05.624939  3544 slave.cpp:2034] Shutting down framework 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-0000
{noformat}

h2. Snippet of master flags
{noformat}
Feb 11 00:22:19 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:22:19.690507  1362 master.cpp:369] Flags at startup: 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="false" --authenticate_slaves="false" --authenticators="crammd5" 
--authorizers="local" --cluster="cody-cm52sd-2" --framework_sorter="drf" 
--help="false" --hostname_lookup="false" --initialize_driver_logging="true" 
--ip_discovery_command="/opt/mesosphere/bin/detect_ip" 
--log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" 
--logging_level="INFO" --max_slave_ping_timeouts="5" --port="5050" 
--quiet="false" --quorum="1" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="5secs" --registry_strict="false" 
--roles="slave_public" --root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/opt/mesosphere/packages/mesos--4dd59ec6bde2052f6f2a0a0da415b6c92c3c418a/share/mesos/webui"
 --weights="slave_public=1" --work_dir="/var/lib/mesos/master" 
--zk="zk://127.0.0.1:2181/mesos" --zk_session_timeout="10secs"
{noformat}

h2. Master initially registering the slave
{noformat}
Feb 11 00:23:01 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:23:01.968310  1373 master.cpp:3859] Registering slave at 
slave(1)@10.0.0.52:5051 (10.0.0.52) with id 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0
{noformat}


{noformat}
Feb 11 00:23:01 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:23:01.976769  1374 log.cpp:704] Attempting to truncate the log to 3
Feb 11 00:23:01 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:23:01.976820  1370 coordinator.cpp:350] Coordinator attempting to 
write TRUNCATE action at position 4
Feb 11 00:23:01 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:23:01.977002  1369 replica.cpp:540] Replica received write request for 
position 4 from (13)@10.0.4.187:5050
Feb 11 00:23:01 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:23:01.977157  1374 master.cpp:3927] Registered slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 at slave(1)@10.0.0.52:5051 (10.0.0.52) 
with ports(*):[1025-
Feb 11 00:23:01 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:23:01.977207  1368 hierarchical.cpp:344] Added slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 (10.0.0.52) with ports(*):[1025-2180, 
2182-3887, 3889-5049,
Feb 11 00:23:01 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:23:01.977552  1368 master.cpp:4979] Sending 1 offers to framework 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-0000 (marathon) at 
scheduler-8174298d-3ef3-4683-9
Feb 11 00:23:01 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:23:01.978520  1369 leveldb.cpp:343] Persisting action (16 bytes) to 
leveldb took 1.485099ms
Feb 11 00:23:01 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:23:01.978559  1369 replica.cpp:715] Persisted action at 4
Feb 11 00:23:01 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:23:01.978710  1369 replica.cpp:694] Replica received learned notice 
for position 4 from @0.0.0.0:0
Feb 11 00:23:01 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:23:01.979212  1372 master.cpp:4269] Received update of slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 at slave(1)@10.0.0.52:5051 (10.0.0.52) 
with total o
Feb 11 00:23:01 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:23:01.979322  1372 hierarchical.cpp:400] Slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 (10.0.0.52) updated with oversubscribed 
resources  (total: ports(
Feb 11 00:23:01 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:23:01.980257  1369 leveldb.cpp:343] Persisting action (18 bytes) to 
leveldb took 1.514614ms
{noformat}

h2. Lose the slave
{noformat}
Feb 11 00:32:12 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:32:12.578547  1368 master.cpp:1083] Slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 at slave(1)@10.0.0.52:5051 (10.0.0.52) 
disconnected
Feb 11 00:32:12 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:32:12.578627  1368 master.cpp:2531] Disconnecting slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 at slave(1)@10.0.0.52:5051 (10.0.0.52)
Feb 11 00:32:12 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:32:12.578673  1368 master.cpp:2550] Deactivating slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 at slave(1)@10.0.0.52:5051 (10.0.0.52)
Feb 11 00:32:12 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:32:12.578764  1374 hierarchical.cpp:429] Slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 deactivated
{noformat}

h2. Slave came back (earlier restart, only gone for seconds)
{noformat}
Feb 11 00:32:15 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:32:15.965806  1370 master.cpp:4019] Re-registering slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 at slave(1)@10.0.0.52:5051 (10.0.0.52)
Feb 11 00:32:15 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:32:15.966354  1373 hierarchical.cpp:417] Slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 reactivated
Feb 11 00:32:15 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:32:15.966419  1370 master.cpp:4207] Sending updated checkpointed 
resources  to slave 0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 at 
slave(1)@10.0.0.52:5051 
Feb 11 00:32:15 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:32:15.967167  1371 master.cpp:4269] Received update of slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 at slave(1)@10.0.0.52:5051 (10.0.0.52) 
with total o
Feb 11 00:32:15 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:32:15.967296  1371 hierarchical.cpp:400] Slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 (10.0.0.52) updated with oversubscribed 
resources  (total: ports(
{noformat}

h2. This shutdown of the slave
{noformat}
Feb 11 00:32:44 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:32:44.142541  1371 http.cpp:334] HTTP GET for /master/state-summary 
from 10.0.4.187:44274 with User-Agent='Mozilla/5.0 (X11; Linux x86_64) 
AppleWebKit/5
Feb 11 00:32:44 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:32:44.150949  1368 master.cpp:1083] Slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 at slave(1)@10.0.0.52:5051 (10.0.0.52) 
disconnected
Feb 11 00:32:44 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:32:44.151002  1368 master.cpp:2531] Disconnecting slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 at slave(1)@10.0.0.52:5051 (10.0.0.52)
Feb 11 00:32:44 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:32:44.151048  1368 master.cpp:2550] Deactivating slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 at slave(1)@10.0.0.52:5051 (10.0.0.52)
Feb 11 00:32:44 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:32:44.151113  1368 hierarchical.cpp:429] Slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 deactivated
{noformat}


h2. Slave lost (The critical part). Slave should be lost at healthcheck 
timeout, not shut down.
{noformat}
Feb 11 00:33:47 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:33:47.009037  1372 master.cpp:236] Shutting down slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 due to health check timeout
Feb 11 00:33:47 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
W0211 00:33:47.009124  1372 master.cpp:4581] Shutting down slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 at slave(1)@10.0.0.52:5051 (10.0.0.52) 
with message 'hea
Feb 11 00:33:47 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:33:47.009181  1372 master.cpp:5846] Removing slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 at slave(1)@10.0.0.52:5051 (10.0.0.52): 
health check timed ou
Feb 11 00:33:47 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:33:47.009297  1372 master.cpp:6066] Updating the state of task 
test-app-2.4057f89f-d056-11e5-8aeb-0242d6f35f4b of framework 
0c9ebb3f-23f8-4fce-b276-9ebc
Feb 11 00:33:47 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:33:47.009353  1369 hierarchical.cpp:373] Removed slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0
{noformat}

h2. Tasks marked as slave-lost
{noformat}
2] Removing task test-app.4076cb59-d056-11e5-8aeb-0242d6f35f4b with resources 
cpus(*):0.1; mem(*):16; ports(*):[2791-2791] of framework 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-0000 on slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 at slave(1)
6] Updating the state of task test-app.40756bc5-d056-11e5-8aeb-0242d6f35f4b of 
framework 0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-0000 (latest state: TASK_LOST, 
status update state: TASK_LOST)
2] Removing task test-app.40756bc5-d056-11e5-8aeb-0242d6f35f4b with resources 
cpus(*):0.1; mem(*):16; ports(*):[6724-6724] of framework 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-0000 on slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 at slave(1)
6] Updating the state of task test-app-2.40765628-d056-11e5-8aeb-0242d6f35f4b 
of framework 0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-0000 (latest state: 
TASK_LOST, status update state: TASK_LOST)
{noformat}

h2. Slave gone gone
{noformat}
Feb 11 00:33:47 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:33:47.021023  1374 master.cpp:5965] Removed slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 (10.0.0.52): health check timed out
{noformat}

h2. Master refuses to accept slave
{noformat}
Feb 11 00:34:05 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
W0211 00:34:05.614985  1368 master.cpp:3997] Slave 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S0 at slave(1)@10.0.0.52:5051 (10.0.0.52) 
attempted to re-register after 
{noformat}

h2. Slave comes up with new id, registers properly 
{noformat}
Feb 11 00:34:13 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:34:13.757870  1368 master.cpp:3859] Registering slave at 
slave(1)@10.0.0.52:5051 (10.0.0.52) with id 
0c9ebb3f-23f8-4fce-b276-9ebca1ede0b1-S1
Feb 11 00:34:13 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:34:13.758057  1372 registrar.cpp:441] Applied 1 operations in 23020ns; 
attempting to update the 'registry'
Feb 11 00:34:13 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:34:13.758257  1368 log.cpp:685] Attempting to append 367 bytes to the 
log
Feb 11 00:34:13 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:34:13.758316  1368 coordinator.cpp:350] Coordinator attempting to 
write APPEND action at position 7
Feb 11 00:34:13 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:34:13.758450  1368 replica.cpp:540] Replica received write request for 
position 7 from (75)@10.0.4.187:5050
Feb 11 00:34:13 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:34:13.759891  1368 leveldb.cpp:343] Persisting action (386 bytes) to 
leveldb took 1.411937ms
Feb 11 00:34:13 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:34:13.759927  1368 replica.cpp:715] Persisted action at 7
Feb 11 00:34:13 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:34:13.760097  1368 replica.cpp:694] Replica received learned notice 
for position 7 from @0.0.0.0:0
Feb 11 00:34:13 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:34:13.763203  1368 leveldb.cpp:343] Persisting action (388 bytes) to 
leveldb took 3.072892ms
Feb 11 00:34:13 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:34:13.763236  1368 replica.cpp:715] Persisted action at 7
Feb 11 00:34:13 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:34:13.763250  1368 replica.cpp:700] Replica learned APPEND action at 
position 7
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4645) Mesos agent shutdown on healtcheck timeout rather than lost and recovered

Reply via email to