[ https://issues.apache.org/jira/browse/MESOS-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Greg Mann updated MESOS-6137: ----------------------------- Description: Observed on Jenkins CI: {code} I0906 20:01:45.235483 29082 master.cpp:379] Master 9fd91e5d-4257-427d-a7da-3f18d99c8ffa (0a1dc2da838b) started on 172.17.0.3:60366 I0906 20:01:45.235513 29082 master.cpp:381] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/ze1TG1/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" --registry_strict="true" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-1.1.0/_inst/share/mesos/webui" --work_dir="/tmp/ze1TG1/master" --zk_session_timeout="10secs" I0906 20:01:45.236022 29082 master.cpp:431] Master only allowing authenticated frameworks to register I0906 20:01:45.236037 29082 master.cpp:445] Master only allowing authenticated agents to register I0906 20:01:45.236045 29082 master.cpp:458] Master only allowing authenticated HTTP frameworks to register I0906 20:01:45.236054 29082 credentials.hpp:37] Loading credentials for authentication from '/tmp/ze1TG1/credentials' I0906 20:01:45.236392 29082 master.cpp:503] Using default 'crammd5' authenticator I0906 20:01:45.236654 29079 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from __req_res__(6359)@172.17.0.3:60366 I0906 20:01:45.236687 29082 http.cpp:883] Using default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0906 20:01:45.236927 29082 http.cpp:883] Using default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0906 20:01:45.237095 29079 recover.cpp:197] Received a recover response from a replica in STARTING status I0906 20:01:45.237117 29082 http.cpp:883] Using default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0906 20:01:45.237340 29082 master.cpp:583] Authorization enabled I0906 20:01:45.237663 29080 whitelist_watcher.cpp:77] No whitelist given I0906 20:01:45.237685 29075 hierarchical.cpp:149] Initialized hierarchical allocator process I0906 20:01:45.237835 29085 recover.cpp:568] Updating replica status to VOTING I0906 20:01:45.238531 29081 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 378674ns I0906 20:01:45.238560 29081 replica.cpp:320] Persisted replica status to VOTING I0906 20:01:45.238685 29073 recover.cpp:582] Successfully joined the Paxos group I0906 20:01:45.238975 29073 recover.cpp:466] Recover process terminated I0906 20:01:45.240437 29078 master.cpp:1850] Elected as the leading master! I0906 20:01:45.240468 29078 master.cpp:1551] Recovering from registrar I0906 20:01:45.240592 29080 registrar.cpp:332] Recovering registrar I0906 20:01:45.241178 29075 log.cpp:553] Attempting to start the writer I0906 20:01:45.242928 29072 replica.cpp:493] Replica received implicit promise request from __req_res__(6360)@172.17.0.3:60366 with proposal 1 I0906 20:01:45.243324 29072 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 335676ns I0906 20:01:45.243350 29072 replica.cpp:342] Persisted promised to 1 I0906 20:01:45.244056 29081 coordinator.cpp:238] Coordinator attempting to fill missing positions I0906 20:01:45.245538 29078 replica.cpp:388] Replica received explicit promise request from __req_res__(6361)@172.17.0.3:60366 for position 0 with proposal 2 I0906 20:01:45.245995 29078 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 412163ns I0906 20:01:45.246021 29078 replica.cpp:708] Persisted action NOP at position 0 I0906 20:01:45.247329 29082 replica.cpp:537] Replica received write request for position 0 from __req_res__(6362)@172.17.0.3:60366 I0906 20:01:45.247406 29082 leveldb.cpp:436] Reading position from leveldb took 35845ns I0906 20:01:45.247989 29082 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 541972ns I0906 20:01:45.248015 29082 replica.cpp:708] Persisted action NOP at position 0 I0906 20:01:45.248556 29084 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I0906 20:01:45.249241 29084 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 647885ns I0906 20:01:45.249271 29084 replica.cpp:708] Persisted action NOP at position 0 I0906 20:01:45.249914 29085 log.cpp:569] Writer started with ending position 0 I0906 20:01:45.251022 29085 leveldb.cpp:436] Reading position from leveldb took 31388ns I0906 20:01:45.252149 29082 registrar.cpp:365] Successfully fetched the registry (0B) in 11.51104ms I0906 20:01:45.252271 29082 registrar.cpp:464] Applied 1 operations in 21341ns; attempting to update the registry I0906 20:01:45.253073 29078 log.cpp:577] Attempting to append 168 bytes to the log I0906 20:01:45.253250 29081 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I0906 20:01:45.254175 29070 replica.cpp:537] Replica received write request for position 1 from __req_res__(6363)@172.17.0.3:60366 I0906 20:01:45.254654 29070 leveldb.cpp:341] Persisting action (187 bytes) to leveldb took 435222ns I0906 20:01:45.254683 29070 replica.cpp:708] Persisted action APPEND at position 1 I0906 20:01:45.255455 29080 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I0906 20:01:45.255926 29080 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 431510ns I0906 20:01:45.255980 29080 replica.cpp:708] Persisted action APPEND at position 1 I0906 20:01:45.257114 29073 registrar.cpp:509] Successfully updated the registry in 4.780032ms I0906 20:01:45.257305 29073 registrar.cpp:395] Successfully recovered registrar I0906 20:01:45.257380 29082 log.cpp:596] Attempting to truncate the log to 1 I0906 20:01:45.257515 29076 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I0906 20:01:45.258153 29071 master.cpp:1659] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register I0906 20:01:45.258191 29077 hierarchical.cpp:176] Skipping recovery of hierarchical allocator: nothing to recover I0906 20:01:45.258608 29082 replica.cpp:537] Replica received write request for position 2 from __req_res__(6364)@172.17.0.3:60366 I0906 20:01:45.259039 29082 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 388229ns I0906 20:01:45.259068 29082 replica.cpp:708] Persisted action TRUNCATE at position 2 I0906 20:01:45.259778 29071 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I0906 20:01:45.260226 29071 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 411069ns I0906 20:01:45.260299 29071 leveldb.cpp:399] Deleting ~1 keys from leveldb took 40611ns I0906 20:01:45.260321 29071 replica.cpp:708] Persisted action TRUNCATE at position 2 I0906 20:01:45.266494 29085 slave.cpp:205] Mesos agent started on @172.17.0.3:60366 I0906 20:01:45.266513 29085 slave.cpp:206] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt/credential" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_credentials="/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt/http_credentials" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/mesos/mesos-1.1.0/_build/src" --logbufsecs="0" --logging_level="INFO" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="10ms" --resources="[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":2048.0},"type":"SCALAR"},{"name":"disk","role":"role1","scalar":{"value":4096.0},"type":"SCALAR"}]" --revocable_cpu_low_priority="true" --runtime_dir="/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_DFKGtZ" I0906 20:01:45.266980 29085 credentials.hpp:86] Loading credential for authentication from '/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt/credential' I0906 20:01:45.267125 29085 slave.cpp:343] Agent using credential for: test-principal I0906 20:01:45.267143 29085 credentials.hpp:37] Loading credentials for authentication from '/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt/http_credentials' I0906 20:01:45.267366 29085 http.cpp:883] Using default 'basic' HTTP authenticator for realm 'mesos-agent-readonly' I0906 20:01:45.267477 29085 http.cpp:883] Using default 'basic' HTTP authenticator for realm 'mesos-agent-readwrite' I0906 20:01:45.267544 29051 sched.cpp:226] Version: 1.1.0 I0906 20:01:45.268095 29074 sched.cpp:330] New master detected at master@172.17.0.3:60366 I0906 20:01:45.268167 29074 sched.cpp:396] Authenticating with master master@172.17.0.3:60366 I0906 20:01:45.268182 29074 sched.cpp:403] Using default CRAM-MD5 authenticatee I0906 20:01:45.268357 29078 authenticatee.cpp:121] Creating new client SASL connection I0906 20:01:45.268568 29077 master.cpp:6167] Authenticating scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 I0906 20:01:45.268654 29076 authenticator.cpp:414] Starting authentication session for crammd5-authenticatee(1048)@172.17.0.3:60366 I0906 20:01:45.268726 29085 slave.cpp:526] Agent resources: cpus(*):2; mem(*):2048; disk(role1):4096; ports(*):[31000-32000] I0906 20:01:45.268831 29085 slave.cpp:534] Agent attributes: [ ] I0906 20:01:45.268847 29085 slave.cpp:539] Agent hostname: 0a1dc2da838b I0906 20:01:45.268853 29080 authenticator.cpp:98] Creating new server SASL connection I0906 20:01:45.269053 29071 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5 I0906 20:01:45.269075 29071 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5' I0906 20:01:45.269160 29077 authenticator.cpp:204] Received SASL authentication start I0906 20:01:45.269218 29077 authenticator.cpp:326] Authentication requires more steps I0906 20:01:45.269314 29079 authenticatee.cpp:259] Received SASL authentication step I0906 20:01:45.269420 29081 authenticator.cpp:232] Received SASL authentication step I0906 20:01:45.269450 29081 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '0a1dc2da838b' server FQDN: '0a1dc2da838b' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0906 20:01:45.269464 29081 auxprop.cpp:181] Looking up auxiliary property '*userPassword' I0906 20:01:45.269490 29081 auxprop.cpp:181] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0906 20:01:45.269506 29081 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '0a1dc2da838b' server FQDN: '0a1dc2da838b' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0906 20:01:45.269515 29081 auxprop.cpp:131] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0906 20:01:45.269521 29081 auxprop.cpp:131] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0906 20:01:45.269534 29081 authenticator.cpp:318] Authentication success I0906 20:01:45.269620 29070 authenticatee.cpp:299] Authentication success I0906 20:01:45.269661 29084 master.cpp:6197] Successfully authenticated principal 'test-principal' at scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 I0906 20:01:45.269729 29074 authenticator.cpp:432] Authentication session cleanup for crammd5-authenticatee(1048)@172.17.0.3:60366 I0906 20:01:45.269861 29071 sched.cpp:502] Successfully authenticated with master master@172.17.0.3:60366 I0906 20:01:45.269877 29071 sched.cpp:820] Sending SUBSCRIBE call to master@172.17.0.3:60366 I0906 20:01:45.269948 29071 sched.cpp:853] Will retry registration in 1.200847472secs if necessary I0906 20:01:45.270069 29084 master.cpp:2424] Received SUBSCRIBE call for framework 'default' at scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 I0906 20:01:45.270113 29084 master.cpp:1886] Authorizing framework principal 'test-principal' to receive offers for role 'role1' I0906 20:01:45.270314 29072 state.cpp:57] Recovering state from '/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_DFKGtZ/meta' I0906 20:01:45.270467 29070 master.cpp:2500] Subscribing framework default with checkpointing disabled and capabilities [ ] I0906 20:01:45.270505 29075 status_update_manager.cpp:203] Recovering status update manager I0906 20:01:45.270777 29081 slave.cpp:4887] Finished recovery I0906 20:01:45.270908 29074 sched.cpp:743] Framework registered with 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 I0906 20:01:45.270942 29074 sched.cpp:757] Scheduler::registered took 15584ns I0906 20:01:45.270970 29084 hierarchical.cpp:269] Added framework 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 I0906 20:01:45.271028 29084 hierarchical.cpp:1550] No allocations performed I0906 20:01:45.271051 29084 hierarchical.cpp:1645] No inverse offers to send out! I0906 20:01:45.271092 29084 hierarchical.cpp:1194] Performed allocation for 0 agents in 102494ns I0906 20:01:45.271229 29081 slave.cpp:5059] Querying resource estimator for oversubscribable resources I0906 20:01:45.271414 29075 status_update_manager.cpp:177] Pausing sending status updates I0906 20:01:45.271414 29081 slave.cpp:902] New master detected at master@172.17.0.3:60366 I0906 20:01:46.238718 29073 hierarchical.cpp:1550] No allocations performed I0906 20:02:00.269846 29071 master.cpp:1288] Framework 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 (default) at scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 disconnected I0906 20:02:07.263937 29073 hierarchical.cpp:1645] No inverse offers to send out! I0906 20:02:07.263902 29071 master.cpp:2725] Disconnecting framework 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 (default) at scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 I0906 20:02:07.264065 29071 master.cpp:2749] Deactivating framework 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 (default) at scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 I0906 20:02:07.264094 29073 hierarchical.cpp:1194] Performed allocation for 0 agents in 21.025474006secs I0906 20:02:07.264175 29071 master.cpp:1301] Giving framework 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 (default) at scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 0ns to failover I0906 20:02:07.264307 29073 hierarchical.cpp:380] Deactivated framework 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 I0906 20:02:07.264336 29071 master.cpp:1096] Master terminating *** Aborted at 1473192127 (unix time) try "date -d @1473192127" if you are using GNU date *** PC: @ 0x2ac83e9ac40b (unknown) *** SIGSEGV (@0x2ac880049000) received by PID 29051 (TID 0x2ac848bc4700) from PID 18446744071562366976; stack trace: *** @ 0x2ac8947d62c7 (unknown) @ 0x2ac8947da5a9 (unknown) I0906 20:02:07.269142 29085 hierarchical.cpp:331] Removed framework 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 @ 0x2ac83f13f330 (unknown) @ 0x2ac83e9ac40b (unknown) @ 0x2ac83e9a3c05 (unknown) I0906 20:02:07.274950 29051 cluster.cpp:157] Creating default 'local' authorizer @ 0x2ac83d042c98 process::operator<<() I0906 20:02:07.277822 29051 leveldb.cpp:174] Opened db in 2.422111ms I0906 20:02:07.279304 29051 leveldb.cpp:181] Compacted db in 1.434065ms I0906 20:02:07.279400 29051 leveldb.cpp:196] Created db iterator in 26692ns I0906 20:02:07.279427 29051 leveldb.cpp:202] Seeked to beginning of db in 2257ns I0906 20:02:07.279448 29051 leveldb.cpp:271] Iterated through 0 keys in the db in 362ns I0906 20:02:07.279505 29051 replica.cpp:776] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0906 20:02:07.280604 29079 recover.cpp:451] Starting replica recovery I0906 20:02:07.281153 29079 recover.cpp:477] Replica is in EMPTY status I0906 20:02:07.282649 29071 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from __req_res__(6365)@172.17.0.3:60366 I0906 20:02:07.283185 29076 recover.cpp:197] Received a recover response from a replica in EMPTY status I0906 20:02:07.283640 29070 recover.cpp:568] Updating replica status to STARTING I0906 20:02:07.284180 29071 master.cpp:379] Master f6076bbd-3be2-4c01-b593-d50e2743a2c9 (0a1dc2da838b) started on 172.17.0.3:60366 I0906 20:02:07.284554 29075 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 654887ns I0906 20:02:07.284205 29071 master.cpp:381] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/WfTwZm/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" --registry_strict="true" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-1.1.0/_inst/share/mesos/webui" --work_dir="/tmp/WfTwZm/master" --zk_session_timeout="10secs" I0906 20:02:07.284587 29075 replica.cpp:320] Persisted replica status to STARTING I0906 20:02:07.284613 29071 master.cpp:431] Master only allowing authenticated frameworks to register I0906 20:02:07.284627 29071 master.cpp:445] Master only allowing authenticated agents to register I0906 20:02:07.284636 29071 master.cpp:458] Master only allowing authenticated HTTP frameworks to register I0906 20:02:07.284644 29071 credentials.hpp:37] Loading credentials for authentication from '/tmp/WfTwZm/credentials' I0906 20:02:07.284814 29078 recover.cpp:477] Replica is in STARTING status I0906 20:02:07.284943 29071 master.cpp:503] Using default 'crammd5' authenticator I0906 20:02:07.285138 29071 http.cpp:883] Using default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0906 20:02:07.285303 29071 http.cpp:883] Using default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0906 20:02:07.285500 29071 http.cpp:883] Using default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0906 20:02:07.285640 29071 master.cpp:583] Authorization enabled I0906 20:02:07.285848 29072 whitelist_watcher.cpp:77] No whitelist given I0906 20:02:07.286067 29083 hierarchical.cpp:149] Initialized hierarchical allocator process I0906 20:02:07.286173 29073 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from __req_res__(6366)@172.17.0.3:60366 I0906 20:02:07.286520 29082 recover.cpp:197] Received a recover response from a replica in STARTING status I0906 20:02:07.287076 29073 recover.cpp:568] Updating replica status to VOTING I0906 20:02:07.287904 29084 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 597451ns I0906 20:02:07.287938 29084 replica.cpp:320] Persisted replica status to VOTING I0906 20:02:07.288169 29076 recover.cpp:582] Successfully joined the Paxos group I0906 20:02:07.288481 29076 recover.cpp:466] Recover process terminated I0906 20:02:07.289659 29084 master.cpp:1850] Elected as the leading master! I0906 20:02:07.289693 29084 master.cpp:1551] Recovering from registrar I0906 20:02:07.289862 29079 registrar.cpp:332] Recovering registrar I0906 20:02:07.290505 29075 log.cpp:553] Attempting to start the writer I0906 20:02:07.292006 29074 replica.cpp:493] Replica received implicit promise request from __req_res__(6367)@172.17.0.3:60366 with proposal 1 @ 0x2ac83c44fac3 mesos::internal::slave::Slave::authenticate() I0906 20:02:07.292558 29074 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 508694ns I0906 20:02:07.292584 29074 replica.cpp:342] Persisted promised to 1 I0906 20:02:07.293391 29080 coordinator.cpp:238] Coordinator attempting to fill missing positions I0906 20:02:07.294734 29073 replica.cpp:388] Replica received explicit promise request from __req_res__(6368)@172.17.0.3:60366 for position 0 with proposal 2 I0906 20:02:07.295254 29073 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 472361ns I0906 20:02:07.295285 29073 replica.cpp:708] Persisted action NOP at position 0 I0906 20:02:07.296751 29076 replica.cpp:537] Replica received write request for position 0 from __req_res__(6369)@172.17.0.3:60366 I0906 20:02:07.296835 29076 leveldb.cpp:436] Reading position from leveldb took 39744ns I0906 20:02:07.297452 29076 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 554740ns I0906 20:02:07.297485 29076 replica.cpp:708] Persisted action NOP at position 0 I0906 20:02:07.298262 29083 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I0906 20:02:07.298765 29083 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 460819ns I0906 20:02:07.298796 29083 replica.cpp:708] Persisted action NOP at position 0 @ 0x2ac83c44f56b mesos::internal::slave::Slave::detected() I0906 20:02:07.299576 29085 log.cpp:569] Writer started with ending position 0 I0906 20:02:07.300812 29071 leveldb.cpp:436] Reading position from leveldb took 31797ns I0906 20:02:07.301996 29073 registrar.cpp:365] Successfully fetched the registry (0B) in 12.048896ms I0906 20:02:07.302140 29073 registrar.cpp:464] Applied 1 operations in 32924ns; attempting to update the registry I0906 20:02:07.303042 29078 log.cpp:577] Attempting to append 168 bytes to the log I0906 20:02:07.303190 29079 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 @ 0x2ac83c4a5d03 _ZZN7process8dispatchIN5mesos8internal5slave5SlaveERKNS_6FutureI6OptionINS1_10MasterInfoEEEES9_EEvRKNS_3PIDIT_EEMSD_FvT0_ET1_ENKUlPNS_11ProcessBaseEE_clESM_ I0906 20:02:07.304149 29076 replica.cpp:537] Replica received write request for position 1 from __req_res__(6370)@172.17.0.3:60366 I0906 20:02:07.304754 29076 leveldb.cpp:341] Persisting action (187 bytes) to leveldb took 546211ns I0906 20:02:07.304786 29076 replica.cpp:708] Persisted action APPEND at position 1 I0906 20:02:07.305613 29078 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I0906 20:02:07.306145 29078 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 490605ns I0906 20:02:07.306182 29078 replica.cpp:708] Persisted action APPEND at position 1 I0906 20:02:07.307394 29070 registrar.cpp:509] Successfully updated the registry in 5.172736ms I0906 20:02:07.307579 29070 registrar.cpp:395] Successfully recovered registrar I0906 20:02:07.307659 29085 log.cpp:596] Attempting to truncate the log to 1 I0906 20:02:07.307802 29073 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I0906 20:02:07.308280 29072 master.cpp:1659] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register I0906 20:02:07.308377 29085 hierarchical.cpp:176] Skipping recovery of hierarchical allocator: nothing to recover I0906 20:02:07.309029 29073 replica.cpp:537] Replica received write request for position 2 from __req_res__(6371)@172.17.0.3:60366 I0906 20:02:07.309675 29073 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 528589ns I0906 20:02:07.309706 29073 replica.cpp:708] Persisted action TRUNCATE at position 2 I0906 20:02:07.310412 29082 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I0906 20:02:07.310714 29082 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 272545ns I0906 20:02:07.310772 29082 leveldb.cpp:399] Deleting ~1 keys from leveldb took 33082ns I0906 20:02:07.310802 29082 replica.cpp:708] Persisted action TRUNCATE at position 2 @ 0x2ac83c4d821e _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal5slave5SlaveERKNS0_6FutureI6OptionINS5_10MasterInfoEEEESD_EEvRKNS0_3PIDIT_EEMSH_FvT0_ET1_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ @ 0x2ac83d085c43 std::function<>::operator()() @ 0x2ac83d068bcb process::ProcessBase::visit() @ 0x2ac83d070fe0 process::DispatchEvent::visit() @ 0xa196b2 process::ProcessBase::serve() @ 0x2ac83d064ec0 process::ProcessManager::resume() @ 0x2ac83d061b2d _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv @ 0x2ac83d070788 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE @ 0x2ac83d0706df _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv @ 0x2ac83d070678 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv @ 0x2ac83e9c0a60 (unknown) @ 0x2ac83f137184 start_thread @ 0x2ac83f44737d (unknown) make[4]: *** [check-local] Segmentation fault {code} It looks like the framework disconnects and the master shuts down prematurely. Attached is the full log from the CI run. was: Observed on Jenkins CI: {code} I0906 20:01:45.235483 29082 master.cpp:379] Master 9fd91e5d-4257-427d-a7da-3f18d99c8ffa (0a1dc2da838b) started on 172.17.0.3:60366 I0906 20:01:45.235513 29082 master.cpp:381] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/ze1TG1/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" --registry_strict="true" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-1.1.0/_inst/share/mesos/webui" --work_dir="/tmp/ze1TG1/master" --zk_session_timeout="10secs" I0906 20:01:45.236022 29082 master.cpp:431] Master only allowing authenticated frameworks to register I0906 20:01:45.236037 29082 master.cpp:445] Master only allowing authenticated agents to register I0906 20:01:45.236045 29082 master.cpp:458] Master only allowing authenticated HTTP frameworks to register I0906 20:01:45.236054 29082 credentials.hpp:37] Loading credentials for authentication from '/tmp/ze1TG1/credentials' I0906 20:01:45.236392 29082 master.cpp:503] Using default 'crammd5' authenticator I0906 20:01:45.236654 29079 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from __req_res__(6359)@172.17.0.3:60366 I0906 20:01:45.236687 29082 http.cpp:883] Using default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0906 20:01:45.236927 29082 http.cpp:883] Using default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0906 20:01:45.237095 29079 recover.cpp:197] Received a recover response from a replica in STARTING status I0906 20:01:45.237117 29082 http.cpp:883] Using default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0906 20:01:45.237340 29082 master.cpp:583] Authorization enabled I0906 20:01:45.237663 29080 whitelist_watcher.cpp:77] No whitelist given I0906 20:01:45.237685 29075 hierarchical.cpp:149] Initialized hierarchical allocator process I0906 20:01:45.237835 29085 recover.cpp:568] Updating replica status to VOTING I0906 20:01:45.238531 29081 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 378674ns I0906 20:01:45.238560 29081 replica.cpp:320] Persisted replica status to VOTING I0906 20:01:45.238685 29073 recover.cpp:582] Successfully joined the Paxos group I0906 20:01:45.238975 29073 recover.cpp:466] Recover process terminated I0906 20:01:45.240437 29078 master.cpp:1850] Elected as the leading master! I0906 20:01:45.240468 29078 master.cpp:1551] Recovering from registrar I0906 20:01:45.240592 29080 registrar.cpp:332] Recovering registrar I0906 20:01:45.241178 29075 log.cpp:553] Attempting to start the writer I0906 20:01:45.242928 29072 replica.cpp:493] Replica received implicit promise request from __req_res__(6360)@172.17.0.3:60366 with proposal 1 I0906 20:01:45.243324 29072 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 335676ns I0906 20:01:45.243350 29072 replica.cpp:342] Persisted promised to 1 I0906 20:01:45.244056 29081 coordinator.cpp:238] Coordinator attempting to fill missing positions I0906 20:01:45.245538 29078 replica.cpp:388] Replica received explicit promise request from __req_res__(6361)@172.17.0.3:60366 for position 0 with proposal 2 I0906 20:01:45.245995 29078 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 412163ns I0906 20:01:45.246021 29078 replica.cpp:708] Persisted action NOP at position 0 I0906 20:01:45.247329 29082 replica.cpp:537] Replica received write request for position 0 from __req_res__(6362)@172.17.0.3:60366 I0906 20:01:45.247406 29082 leveldb.cpp:436] Reading position from leveldb took 35845ns I0906 20:01:45.247989 29082 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 541972ns I0906 20:01:45.248015 29082 replica.cpp:708] Persisted action NOP at position 0 I0906 20:01:45.248556 29084 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I0906 20:01:45.249241 29084 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 647885ns I0906 20:01:45.249271 29084 replica.cpp:708] Persisted action NOP at position 0 I0906 20:01:45.249914 29085 log.cpp:569] Writer started with ending position 0 I0906 20:01:45.251022 29085 leveldb.cpp:436] Reading position from leveldb took 31388ns I0906 20:01:45.252149 29082 registrar.cpp:365] Successfully fetched the registry (0B) in 11.51104ms I0906 20:01:45.252271 29082 registrar.cpp:464] Applied 1 operations in 21341ns; attempting to update the registry I0906 20:01:45.253073 29078 log.cpp:577] Attempting to append 168 bytes to the log I0906 20:01:45.253250 29081 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 I0906 20:01:45.254175 29070 replica.cpp:537] Replica received write request for position 1 from __req_res__(6363)@172.17.0.3:60366 I0906 20:01:45.254654 29070 leveldb.cpp:341] Persisting action (187 bytes) to leveldb took 435222ns I0906 20:01:45.254683 29070 replica.cpp:708] Persisted action APPEND at position 1 I0906 20:01:45.255455 29080 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I0906 20:01:45.255926 29080 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 431510ns I0906 20:01:45.255980 29080 replica.cpp:708] Persisted action APPEND at position 1 I0906 20:01:45.257114 29073 registrar.cpp:509] Successfully updated the registry in 4.780032ms I0906 20:01:45.257305 29073 registrar.cpp:395] Successfully recovered registrar I0906 20:01:45.257380 29082 log.cpp:596] Attempting to truncate the log to 1 I0906 20:01:45.257515 29076 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I0906 20:01:45.258153 29071 master.cpp:1659] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register I0906 20:01:45.258191 29077 hierarchical.cpp:176] Skipping recovery of hierarchical allocator: nothing to recover I0906 20:01:45.258608 29082 replica.cpp:537] Replica received write request for position 2 from __req_res__(6364)@172.17.0.3:60366 I0906 20:01:45.259039 29082 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 388229ns I0906 20:01:45.259068 29082 replica.cpp:708] Persisted action TRUNCATE at position 2 I0906 20:01:45.259778 29071 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I0906 20:01:45.260226 29071 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 411069ns I0906 20:01:45.260299 29071 leveldb.cpp:399] Deleting ~1 keys from leveldb took 40611ns I0906 20:01:45.260321 29071 replica.cpp:708] Persisted action TRUNCATE at position 2 I0906 20:01:45.266494 29085 slave.cpp:205] Mesos agent started on @172.17.0.3:60366 I0906 20:01:45.266513 29085 slave.cpp:206] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt/credential" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_credentials="/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt/http_credentials" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/mesos/mesos-1.1.0/_build/src" --logbufsecs="0" --logging_level="INFO" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="10ms" --resources="[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":2048.0},"type":"SCALAR"},{"name":"disk","role":"role1","scalar":{"value":4096.0},"type":"SCALAR"}]" --revocable_cpu_low_priority="true" --runtime_dir="/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_DFKGtZ" I0906 20:01:45.266980 29085 credentials.hpp:86] Loading credential for authentication from '/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt/credential' I0906 20:01:45.267125 29085 slave.cpp:343] Agent using credential for: test-principal I0906 20:01:45.267143 29085 credentials.hpp:37] Loading credentials for authentication from '/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt/http_credentials' I0906 20:01:45.267366 29085 http.cpp:883] Using default 'basic' HTTP authenticator for realm 'mesos-agent-readonly' I0906 20:01:45.267477 29085 http.cpp:883] Using default 'basic' HTTP authenticator for realm 'mesos-agent-readwrite' I0906 20:01:45.267544 29051 sched.cpp:226] Version: 1.1.0 I0906 20:01:45.268095 29074 sched.cpp:330] New master detected at master@172.17.0.3:60366 I0906 20:01:45.268167 29074 sched.cpp:396] Authenticating with master master@172.17.0.3:60366 I0906 20:01:45.268182 29074 sched.cpp:403] Using default CRAM-MD5 authenticatee I0906 20:01:45.268357 29078 authenticatee.cpp:121] Creating new client SASL connection I0906 20:01:45.268568 29077 master.cpp:6167] Authenticating scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 I0906 20:01:45.268654 29076 authenticator.cpp:414] Starting authentication session for crammd5-authenticatee(1048)@172.17.0.3:60366 I0906 20:01:45.268726 29085 slave.cpp:526] Agent resources: cpus(*):2; mem(*):2048; disk(role1):4096; ports(*):[31000-32000] I0906 20:01:45.268831 29085 slave.cpp:534] Agent attributes: [ ] I0906 20:01:45.268847 29085 slave.cpp:539] Agent hostname: 0a1dc2da838b I0906 20:01:45.268853 29080 authenticator.cpp:98] Creating new server SASL connection I0906 20:01:45.269053 29071 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5 I0906 20:01:45.269075 29071 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5' I0906 20:01:45.269160 29077 authenticator.cpp:204] Received SASL authentication start I0906 20:01:45.269218 29077 authenticator.cpp:326] Authentication requires more steps I0906 20:01:45.269314 29079 authenticatee.cpp:259] Received SASL authentication step I0906 20:01:45.269420 29081 authenticator.cpp:232] Received SASL authentication step I0906 20:01:45.269450 29081 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '0a1dc2da838b' server FQDN: '0a1dc2da838b' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I0906 20:01:45.269464 29081 auxprop.cpp:181] Looking up auxiliary property '*userPassword' I0906 20:01:45.269490 29081 auxprop.cpp:181] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I0906 20:01:45.269506 29081 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '0a1dc2da838b' server FQDN: '0a1dc2da838b' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I0906 20:01:45.269515 29081 auxprop.cpp:131] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I0906 20:01:45.269521 29081 auxprop.cpp:131] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I0906 20:01:45.269534 29081 authenticator.cpp:318] Authentication success I0906 20:01:45.269620 29070 authenticatee.cpp:299] Authentication success I0906 20:01:45.269661 29084 master.cpp:6197] Successfully authenticated principal 'test-principal' at scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 I0906 20:01:45.269729 29074 authenticator.cpp:432] Authentication session cleanup for crammd5-authenticatee(1048)@172.17.0.3:60366 I0906 20:01:45.269861 29071 sched.cpp:502] Successfully authenticated with master master@172.17.0.3:60366 I0906 20:01:45.269877 29071 sched.cpp:820] Sending SUBSCRIBE call to master@172.17.0.3:60366 I0906 20:01:45.269948 29071 sched.cpp:853] Will retry registration in 1.200847472secs if necessary I0906 20:01:45.270069 29084 master.cpp:2424] Received SUBSCRIBE call for framework 'default' at scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 I0906 20:01:45.270113 29084 master.cpp:1886] Authorizing framework principal 'test-principal' to receive offers for role 'role1' I0906 20:01:45.270314 29072 state.cpp:57] Recovering state from '/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_DFKGtZ/meta' I0906 20:01:45.270467 29070 master.cpp:2500] Subscribing framework default with checkpointing disabled and capabilities [ ] I0906 20:01:45.270505 29075 status_update_manager.cpp:203] Recovering status update manager I0906 20:01:45.270777 29081 slave.cpp:4887] Finished recovery I0906 20:01:45.270908 29074 sched.cpp:743] Framework registered with 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 I0906 20:01:45.270942 29074 sched.cpp:757] Scheduler::registered took 15584ns I0906 20:01:45.270970 29084 hierarchical.cpp:269] Added framework 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 I0906 20:01:45.271028 29084 hierarchical.cpp:1550] No allocations performed I0906 20:01:45.271051 29084 hierarchical.cpp:1645] No inverse offers to send out! I0906 20:01:45.271092 29084 hierarchical.cpp:1194] Performed allocation for 0 agents in 102494ns I0906 20:01:45.271229 29081 slave.cpp:5059] Querying resource estimator for oversubscribable resources I0906 20:01:45.271414 29075 status_update_manager.cpp:177] Pausing sending status updates I0906 20:01:45.271414 29081 slave.cpp:902] New master detected at master@172.17.0.3:60366 I0906 20:01:46.238718 29073 hierarchical.cpp:1550] No allocations performed I0906 20:02:00.269846 29071 master.cpp:1288] Framework 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 (default) at scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 disconnected I0906 20:02:07.263937 29073 hierarchical.cpp:1645] No inverse offers to send out! I0906 20:02:07.263902 29071 master.cpp:2725] Disconnecting framework 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 (default) at scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 I0906 20:02:07.264065 29071 master.cpp:2749] Deactivating framework 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 (default) at scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 I0906 20:02:07.264094 29073 hierarchical.cpp:1194] Performed allocation for 0 agents in 21.025474006secs I0906 20:02:07.264175 29071 master.cpp:1301] Giving framework 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 (default) at scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 0ns to failover I0906 20:02:07.264307 29073 hierarchical.cpp:380] Deactivated framework 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 I0906 20:02:07.264336 29071 master.cpp:1096] Master terminating *** Aborted at 1473192127 (unix time) try "date -d @1473192127" if you are using GNU date *** PC: @ 0x2ac83e9ac40b (unknown) *** SIGSEGV (@0x2ac880049000) received by PID 29051 (TID 0x2ac848bc4700) from PID 18446744071562366976; stack trace: *** @ 0x2ac8947d62c7 (unknown) @ 0x2ac8947da5a9 (unknown) I0906 20:02:07.269142 29085 hierarchical.cpp:331] Removed framework 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 @ 0x2ac83f13f330 (unknown) @ 0x2ac83e9ac40b (unknown) @ 0x2ac83e9a3c05 (unknown) I0906 20:02:07.274950 29051 cluster.cpp:157] Creating default 'local' authorizer @ 0x2ac83d042c98 process::operator<<() I0906 20:02:07.277822 29051 leveldb.cpp:174] Opened db in 2.422111ms I0906 20:02:07.279304 29051 leveldb.cpp:181] Compacted db in 1.434065ms I0906 20:02:07.279400 29051 leveldb.cpp:196] Created db iterator in 26692ns I0906 20:02:07.279427 29051 leveldb.cpp:202] Seeked to beginning of db in 2257ns I0906 20:02:07.279448 29051 leveldb.cpp:271] Iterated through 0 keys in the db in 362ns I0906 20:02:07.279505 29051 replica.cpp:776] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0906 20:02:07.280604 29079 recover.cpp:451] Starting replica recovery I0906 20:02:07.281153 29079 recover.cpp:477] Replica is in EMPTY status I0906 20:02:07.282649 29071 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from __req_res__(6365)@172.17.0.3:60366 I0906 20:02:07.283185 29076 recover.cpp:197] Received a recover response from a replica in EMPTY status I0906 20:02:07.283640 29070 recover.cpp:568] Updating replica status to STARTING I0906 20:02:07.284180 29071 master.cpp:379] Master f6076bbd-3be2-4c01-b593-d50e2743a2c9 (0a1dc2da838b) started on 172.17.0.3:60366 I0906 20:02:07.284554 29075 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 654887ns I0906 20:02:07.284205 29071 master.cpp:381] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/WfTwZm/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" --registry_strict="true" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-1.1.0/_inst/share/mesos/webui" --work_dir="/tmp/WfTwZm/master" --zk_session_timeout="10secs" I0906 20:02:07.284587 29075 replica.cpp:320] Persisted replica status to STARTING I0906 20:02:07.284613 29071 master.cpp:431] Master only allowing authenticated frameworks to register I0906 20:02:07.284627 29071 master.cpp:445] Master only allowing authenticated agents to register I0906 20:02:07.284636 29071 master.cpp:458] Master only allowing authenticated HTTP frameworks to register I0906 20:02:07.284644 29071 credentials.hpp:37] Loading credentials for authentication from '/tmp/WfTwZm/credentials' I0906 20:02:07.284814 29078 recover.cpp:477] Replica is in STARTING status I0906 20:02:07.284943 29071 master.cpp:503] Using default 'crammd5' authenticator I0906 20:02:07.285138 29071 http.cpp:883] Using default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0906 20:02:07.285303 29071 http.cpp:883] Using default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0906 20:02:07.285500 29071 http.cpp:883] Using default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0906 20:02:07.285640 29071 master.cpp:583] Authorization enabled I0906 20:02:07.285848 29072 whitelist_watcher.cpp:77] No whitelist given I0906 20:02:07.286067 29083 hierarchical.cpp:149] Initialized hierarchical allocator process I0906 20:02:07.286173 29073 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from __req_res__(6366)@172.17.0.3:60366 I0906 20:02:07.286520 29082 recover.cpp:197] Received a recover response from a replica in STARTING status I0906 20:02:07.287076 29073 recover.cpp:568] Updating replica status to VOTING I0906 20:02:07.287904 29084 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 597451ns I0906 20:02:07.287938 29084 replica.cpp:320] Persisted replica status to VOTING I0906 20:02:07.288169 29076 recover.cpp:582] Successfully joined the Paxos group I0906 20:02:07.288481 29076 recover.cpp:466] Recover process terminated I0906 20:02:07.289659 29084 master.cpp:1850] Elected as the leading master! I0906 20:02:07.289693 29084 master.cpp:1551] Recovering from registrar I0906 20:02:07.289862 29079 registrar.cpp:332] Recovering registrar I0906 20:02:07.290505 29075 log.cpp:553] Attempting to start the writer I0906 20:02:07.292006 29074 replica.cpp:493] Replica received implicit promise request from __req_res__(6367)@172.17.0.3:60366 with proposal 1 @ 0x2ac83c44fac3 mesos::internal::slave::Slave::authenticate() I0906 20:02:07.292558 29074 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 508694ns I0906 20:02:07.292584 29074 replica.cpp:342] Persisted promised to 1 I0906 20:02:07.293391 29080 coordinator.cpp:238] Coordinator attempting to fill missing positions I0906 20:02:07.294734 29073 replica.cpp:388] Replica received explicit promise request from __req_res__(6368)@172.17.0.3:60366 for position 0 with proposal 2 I0906 20:02:07.295254 29073 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 472361ns I0906 20:02:07.295285 29073 replica.cpp:708] Persisted action NOP at position 0 I0906 20:02:07.296751 29076 replica.cpp:537] Replica received write request for position 0 from __req_res__(6369)@172.17.0.3:60366 I0906 20:02:07.296835 29076 leveldb.cpp:436] Reading position from leveldb took 39744ns I0906 20:02:07.297452 29076 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 554740ns I0906 20:02:07.297485 29076 replica.cpp:708] Persisted action NOP at position 0 I0906 20:02:07.298262 29083 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0 I0906 20:02:07.298765 29083 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 460819ns I0906 20:02:07.298796 29083 replica.cpp:708] Persisted action NOP at position 0 @ 0x2ac83c44f56b mesos::internal::slave::Slave::detected() I0906 20:02:07.299576 29085 log.cpp:569] Writer started with ending position 0 I0906 20:02:07.300812 29071 leveldb.cpp:436] Reading position from leveldb took 31797ns I0906 20:02:07.301996 29073 registrar.cpp:365] Successfully fetched the registry (0B) in 12.048896ms I0906 20:02:07.302140 29073 registrar.cpp:464] Applied 1 operations in 32924ns; attempting to update the registry I0906 20:02:07.303042 29078 log.cpp:577] Attempting to append 168 bytes to the log I0906 20:02:07.303190 29079 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1 @ 0x2ac83c4a5d03 _ZZN7process8dispatchIN5mesos8internal5slave5SlaveERKNS_6FutureI6OptionINS1_10MasterInfoEEEES9_EEvRKNS_3PIDIT_EEMSD_FvT0_ET1_ENKUlPNS_11ProcessBaseEE_clESM_ I0906 20:02:07.304149 29076 replica.cpp:537] Replica received write request for position 1 from __req_res__(6370)@172.17.0.3:60366 I0906 20:02:07.304754 29076 leveldb.cpp:341] Persisting action (187 bytes) to leveldb took 546211ns I0906 20:02:07.304786 29076 replica.cpp:708] Persisted action APPEND at position 1 I0906 20:02:07.305613 29078 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0 I0906 20:02:07.306145 29078 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 490605ns I0906 20:02:07.306182 29078 replica.cpp:708] Persisted action APPEND at position 1 I0906 20:02:07.307394 29070 registrar.cpp:509] Successfully updated the registry in 5.172736ms I0906 20:02:07.307579 29070 registrar.cpp:395] Successfully recovered registrar I0906 20:02:07.307659 29085 log.cpp:596] Attempting to truncate the log to 1 I0906 20:02:07.307802 29073 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2 I0906 20:02:07.308280 29072 master.cpp:1659] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register I0906 20:02:07.308377 29085 hierarchical.cpp:176] Skipping recovery of hierarchical allocator: nothing to recover I0906 20:02:07.309029 29073 replica.cpp:537] Replica received write request for position 2 from __req_res__(6371)@172.17.0.3:60366 I0906 20:02:07.309675 29073 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 528589ns I0906 20:02:07.309706 29073 replica.cpp:708] Persisted action TRUNCATE at position 2 I0906 20:02:07.310412 29082 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0 I0906 20:02:07.310714 29082 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 272545ns I0906 20:02:07.310772 29082 leveldb.cpp:399] Deleting ~1 keys from leveldb took 33082ns I0906 20:02:07.310802 29082 replica.cpp:708] Persisted action TRUNCATE at position 2 @ 0x2ac83c4d821e _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal5slave5SlaveERKNS0_6FutureI6OptionINS5_10MasterInfoEEEESD_EEvRKNS0_3PIDIT_EEMSH_FvT0_ET1_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ @ 0x2ac83d085c43 std::function<>::operator()() @ 0x2ac83d068bcb process::ProcessBase::visit() @ 0x2ac83d070fe0 process::DispatchEvent::visit() @ 0xa196b2 process::ProcessBase::serve() @ 0x2ac83d064ec0 process::ProcessManager::resume() @ 0x2ac83d061b2d _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv @ 0x2ac83d070788 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE @ 0x2ac83d0706df _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv @ 0x2ac83d070678 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv @ 0x2ac83e9c0a60 (unknown) @ 0x2ac83f137184 start_thread @ 0x2ac83f44737d (unknown) make[4]: *** [check-local] Segmentation fault {code} It looks like the framework disconnects and the master shuts down prematurely. > Segfault during > DiskResource/PersistentVolumeTest.IncompatibleCheckpointedResources/0 > ------------------------------------------------------------------------------------- > > Key: MESOS-6137 > URL: https://issues.apache.org/jira/browse/MESOS-6137 > Project: Mesos > Issue Type: Bug > Affects Versions: 1.0.1 > Environment: Ubuntu 14.04, non-SSL, libev > Reporter: Greg Mann > Assignee: Greg Mann > Labels: mesosphere > Attachments: mesos-segfault.txt.zip > > > Observed on Jenkins CI: > {code} > I0906 20:01:45.235483 29082 master.cpp:379] Master > 9fd91e5d-4257-427d-a7da-3f18d99c8ffa (0a1dc2da838b) started on > 172.17.0.3:60366 > I0906 20:01:45.235513 29082 master.cpp:381] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/ze1TG1/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --quiet="false" --recovery_agent_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="100secs" --registry_strict="true" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-1.1.0/_inst/share/mesos/webui" > --work_dir="/tmp/ze1TG1/master" --zk_session_timeout="10secs" > I0906 20:01:45.236022 29082 master.cpp:431] Master only allowing > authenticated frameworks to register > I0906 20:01:45.236037 29082 master.cpp:445] Master only allowing > authenticated agents to register > I0906 20:01:45.236045 29082 master.cpp:458] Master only allowing > authenticated HTTP frameworks to register > I0906 20:01:45.236054 29082 credentials.hpp:37] Loading credentials for > authentication from '/tmp/ze1TG1/credentials' > I0906 20:01:45.236392 29082 master.cpp:503] Using default 'crammd5' > authenticator > I0906 20:01:45.236654 29079 replica.cpp:673] Replica in STARTING status > received a broadcasted recover request from __req_res__(6359)@172.17.0.3:60366 > I0906 20:01:45.236687 29082 http.cpp:883] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0906 20:01:45.236927 29082 http.cpp:883] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0906 20:01:45.237095 29079 recover.cpp:197] Received a recover response from > a replica in STARTING status > I0906 20:01:45.237117 29082 http.cpp:883] Using default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0906 20:01:45.237340 29082 master.cpp:583] Authorization enabled > I0906 20:01:45.237663 29080 whitelist_watcher.cpp:77] No whitelist given > I0906 20:01:45.237685 29075 hierarchical.cpp:149] Initialized hierarchical > allocator process > I0906 20:01:45.237835 29085 recover.cpp:568] Updating replica status to VOTING > I0906 20:01:45.238531 29081 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 378674ns > I0906 20:01:45.238560 29081 replica.cpp:320] Persisted replica status to > VOTING > I0906 20:01:45.238685 29073 recover.cpp:582] Successfully joined the Paxos > group > I0906 20:01:45.238975 29073 recover.cpp:466] Recover process terminated > I0906 20:01:45.240437 29078 master.cpp:1850] Elected as the leading master! > I0906 20:01:45.240468 29078 master.cpp:1551] Recovering from registrar > I0906 20:01:45.240592 29080 registrar.cpp:332] Recovering registrar > I0906 20:01:45.241178 29075 log.cpp:553] Attempting to start the writer > I0906 20:01:45.242928 29072 replica.cpp:493] Replica received implicit > promise request from __req_res__(6360)@172.17.0.3:60366 with proposal 1 > I0906 20:01:45.243324 29072 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 335676ns > I0906 20:01:45.243350 29072 replica.cpp:342] Persisted promised to 1 > I0906 20:01:45.244056 29081 coordinator.cpp:238] Coordinator attempting to > fill missing positions > I0906 20:01:45.245538 29078 replica.cpp:388] Replica received explicit > promise request from __req_res__(6361)@172.17.0.3:60366 for position 0 with > proposal 2 > I0906 20:01:45.245995 29078 leveldb.cpp:341] Persisting action (8 bytes) to > leveldb took 412163ns > I0906 20:01:45.246021 29078 replica.cpp:708] Persisted action NOP at position > 0 > I0906 20:01:45.247329 29082 replica.cpp:537] Replica received write request > for position 0 from __req_res__(6362)@172.17.0.3:60366 > I0906 20:01:45.247406 29082 leveldb.cpp:436] Reading position from leveldb > took 35845ns > I0906 20:01:45.247989 29082 leveldb.cpp:341] Persisting action (14 bytes) to > leveldb took 541972ns > I0906 20:01:45.248015 29082 replica.cpp:708] Persisted action NOP at position > 0 > I0906 20:01:45.248556 29084 replica.cpp:691] Replica received learned notice > for position 0 from @0.0.0.0:0 > I0906 20:01:45.249241 29084 leveldb.cpp:341] Persisting action (16 bytes) to > leveldb took 647885ns > I0906 20:01:45.249271 29084 replica.cpp:708] Persisted action NOP at position > 0 > I0906 20:01:45.249914 29085 log.cpp:569] Writer started with ending position 0 > I0906 20:01:45.251022 29085 leveldb.cpp:436] Reading position from leveldb > took 31388ns > I0906 20:01:45.252149 29082 registrar.cpp:365] Successfully fetched the > registry (0B) in 11.51104ms > I0906 20:01:45.252271 29082 registrar.cpp:464] Applied 1 operations in > 21341ns; attempting to update the registry > I0906 20:01:45.253073 29078 log.cpp:577] Attempting to append 168 bytes to > the log > I0906 20:01:45.253250 29081 coordinator.cpp:348] Coordinator attempting to > write APPEND action at position 1 > I0906 20:01:45.254175 29070 replica.cpp:537] Replica received write request > for position 1 from __req_res__(6363)@172.17.0.3:60366 > I0906 20:01:45.254654 29070 leveldb.cpp:341] Persisting action (187 bytes) to > leveldb took 435222ns > I0906 20:01:45.254683 29070 replica.cpp:708] Persisted action APPEND at > position 1 > I0906 20:01:45.255455 29080 replica.cpp:691] Replica received learned notice > for position 1 from @0.0.0.0:0 > I0906 20:01:45.255926 29080 leveldb.cpp:341] Persisting action (189 bytes) to > leveldb took 431510ns > I0906 20:01:45.255980 29080 replica.cpp:708] Persisted action APPEND at > position 1 > I0906 20:01:45.257114 29073 registrar.cpp:509] Successfully updated the > registry in 4.780032ms > I0906 20:01:45.257305 29073 registrar.cpp:395] Successfully recovered > registrar > I0906 20:01:45.257380 29082 log.cpp:596] Attempting to truncate the log to 1 > I0906 20:01:45.257515 29076 coordinator.cpp:348] Coordinator attempting to > write TRUNCATE action at position 2 > I0906 20:01:45.258153 29071 master.cpp:1659] Recovered 0 agents from the > registry (129B); allowing 10mins for agents to re-register > I0906 20:01:45.258191 29077 hierarchical.cpp:176] Skipping recovery of > hierarchical allocator: nothing to recover > I0906 20:01:45.258608 29082 replica.cpp:537] Replica received write request > for position 2 from __req_res__(6364)@172.17.0.3:60366 > I0906 20:01:45.259039 29082 leveldb.cpp:341] Persisting action (16 bytes) to > leveldb took 388229ns > I0906 20:01:45.259068 29082 replica.cpp:708] Persisted action TRUNCATE at > position 2 > I0906 20:01:45.259778 29071 replica.cpp:691] Replica received learned notice > for position 2 from @0.0.0.0:0 > I0906 20:01:45.260226 29071 leveldb.cpp:341] Persisting action (18 bytes) to > leveldb took 411069ns > I0906 20:01:45.260299 29071 leveldb.cpp:399] Deleting ~1 keys from leveldb > took 40611ns > I0906 20:01:45.260321 29071 replica.cpp:708] Persisted action TRUNCATE at > position 2 > I0906 20:01:45.266494 29085 slave.cpp:205] Mesos agent started on > @172.17.0.3:60366 > I0906 20:01:45.266513 29085 slave.cpp:206] Flags at startup: --acls="" > --appc_simple_discovery_uri_prefix="http://" > --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticatee="crammd5" > --authentication_backoff_factor="1secs" --authorizer="local" > --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" > --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" > --cgroups_root="mesos" --container_disk_watch_interval="15secs" > --containerizers="mesos" > --credential="/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt/credential" > --default_role="*" --disk_watch_interval="1mins" --docker="docker" > --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" > --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" > --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" > --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" > --enforce_container_disk_quota="false" > --executor_registration_timeout="1mins" > --executor_shutdown_grace_period="5secs" > --fetcher_cache_dir="/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt/fetch" > --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" > --gc_disk_headroom="0.1" --hadoop_home="" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_command_executor="false" > --http_credentials="/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt/http_credentials" > --image_provisioner_backend="copy" --initialize_driver_logging="true" > --isolation="posix/cpu,posix/mem" > --launcher_dir="/mesos/mesos-1.1.0/_build/src" --logbufsecs="0" > --logging_level="INFO" --oversubscribed_resources_interval="15secs" > --perf_duration="10secs" --perf_interval="1mins" > --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" > --recovery_timeout="15mins" --registration_backoff_factor="10ms" > --resources="[{"name":"cpus","role":"*","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":2048.0},"type":"SCALAR"},{"name":"disk","role":"role1","scalar":{"value":4096.0},"type":"SCALAR"}]" > --revocable_cpu_low_priority="true" > --runtime_dir="/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt" > --sandbox_directory="/mnt/mesos/sandbox" --strict="true" > --switch_user="true" --systemd_enable_support="true" > --systemd_runtime_directory="/run/systemd/system" --version="false" > --work_dir="/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_DFKGtZ" > I0906 20:01:45.266980 29085 credentials.hpp:86] Loading credential for > authentication from > '/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt/credential' > I0906 20:01:45.267125 29085 slave.cpp:343] Agent using credential for: > test-principal > I0906 20:01:45.267143 29085 credentials.hpp:37] Loading credentials for > authentication from > '/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_SRjbqt/http_credentials' > I0906 20:01:45.267366 29085 http.cpp:883] Using default 'basic' HTTP > authenticator for realm 'mesos-agent-readonly' > I0906 20:01:45.267477 29085 http.cpp:883] Using default 'basic' HTTP > authenticator for realm 'mesos-agent-readwrite' > I0906 20:01:45.267544 29051 sched.cpp:226] Version: 1.1.0 > I0906 20:01:45.268095 29074 sched.cpp:330] New master detected at > master@172.17.0.3:60366 > I0906 20:01:45.268167 29074 sched.cpp:396] Authenticating with master > master@172.17.0.3:60366 > I0906 20:01:45.268182 29074 sched.cpp:403] Using default CRAM-MD5 > authenticatee > I0906 20:01:45.268357 29078 authenticatee.cpp:121] Creating new client SASL > connection > I0906 20:01:45.268568 29077 master.cpp:6167] Authenticating > scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 > I0906 20:01:45.268654 29076 authenticator.cpp:414] Starting authentication > session for crammd5-authenticatee(1048)@172.17.0.3:60366 > I0906 20:01:45.268726 29085 slave.cpp:526] Agent resources: cpus(*):2; > mem(*):2048; disk(role1):4096; ports(*):[31000-32000] > I0906 20:01:45.268831 29085 slave.cpp:534] Agent attributes: [ ] > I0906 20:01:45.268847 29085 slave.cpp:539] Agent hostname: 0a1dc2da838b > I0906 20:01:45.268853 29080 authenticator.cpp:98] Creating new server SASL > connection > I0906 20:01:45.269053 29071 authenticatee.cpp:213] Received SASL > authentication mechanisms: CRAM-MD5 > I0906 20:01:45.269075 29071 authenticatee.cpp:239] Attempting to authenticate > with mechanism 'CRAM-MD5' > I0906 20:01:45.269160 29077 authenticator.cpp:204] Received SASL > authentication start > I0906 20:01:45.269218 29077 authenticator.cpp:326] Authentication requires > more steps > I0906 20:01:45.269314 29079 authenticatee.cpp:259] Received SASL > authentication step > I0906 20:01:45.269420 29081 authenticator.cpp:232] Received SASL > authentication step > I0906 20:01:45.269450 29081 auxprop.cpp:109] Request to lookup properties for > user: 'test-principal' realm: '0a1dc2da838b' server FQDN: '0a1dc2da838b' > SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false > SASL_AUXPROP_AUTHZID: false > I0906 20:01:45.269464 29081 auxprop.cpp:181] Looking up auxiliary property > '*userPassword' > I0906 20:01:45.269490 29081 auxprop.cpp:181] Looking up auxiliary property > '*cmusaslsecretCRAM-MD5' > I0906 20:01:45.269506 29081 auxprop.cpp:109] Request to lookup properties for > user: 'test-principal' realm: '0a1dc2da838b' server FQDN: '0a1dc2da838b' > SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false > SASL_AUXPROP_AUTHZID: true > I0906 20:01:45.269515 29081 auxprop.cpp:131] Skipping auxiliary property > '*userPassword' since SASL_AUXPROP_AUTHZID == true > I0906 20:01:45.269521 29081 auxprop.cpp:131] Skipping auxiliary property > '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true > I0906 20:01:45.269534 29081 authenticator.cpp:318] Authentication success > I0906 20:01:45.269620 29070 authenticatee.cpp:299] Authentication success > I0906 20:01:45.269661 29084 master.cpp:6197] Successfully authenticated > principal 'test-principal' at > scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 > I0906 20:01:45.269729 29074 authenticator.cpp:432] Authentication session > cleanup for crammd5-authenticatee(1048)@172.17.0.3:60366 > I0906 20:01:45.269861 29071 sched.cpp:502] Successfully authenticated with > master master@172.17.0.3:60366 > I0906 20:01:45.269877 29071 sched.cpp:820] Sending SUBSCRIBE call to > master@172.17.0.3:60366 > I0906 20:01:45.269948 29071 sched.cpp:853] Will retry registration in > 1.200847472secs if necessary > I0906 20:01:45.270069 29084 master.cpp:2424] Received SUBSCRIBE call for > framework 'default' at > scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 > I0906 20:01:45.270113 29084 master.cpp:1886] Authorizing framework principal > 'test-principal' to receive offers for role 'role1' > I0906 20:01:45.270314 29072 state.cpp:57] Recovering state from > '/tmp/DiskResource_PersistentVolumeTest_IncompatibleCheckpointedResources_0_DFKGtZ/meta' > I0906 20:01:45.270467 29070 master.cpp:2500] Subscribing framework default > with checkpointing disabled and capabilities [ ] > I0906 20:01:45.270505 29075 status_update_manager.cpp:203] Recovering status > update manager > I0906 20:01:45.270777 29081 slave.cpp:4887] Finished recovery > I0906 20:01:45.270908 29074 sched.cpp:743] Framework registered with > 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 > I0906 20:01:45.270942 29074 sched.cpp:757] Scheduler::registered took 15584ns > I0906 20:01:45.270970 29084 hierarchical.cpp:269] Added framework > 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 > I0906 20:01:45.271028 29084 hierarchical.cpp:1550] No allocations performed > I0906 20:01:45.271051 29084 hierarchical.cpp:1645] No inverse offers to send > out! > I0906 20:01:45.271092 29084 hierarchical.cpp:1194] Performed allocation for 0 > agents in 102494ns > I0906 20:01:45.271229 29081 slave.cpp:5059] Querying resource estimator for > oversubscribable resources > I0906 20:01:45.271414 29075 status_update_manager.cpp:177] Pausing sending > status updates > I0906 20:01:45.271414 29081 slave.cpp:902] New master detected at > master@172.17.0.3:60366 > I0906 20:01:46.238718 29073 hierarchical.cpp:1550] No allocations performed > I0906 20:02:00.269846 29071 master.cpp:1288] Framework > 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 (default) at > scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 disconnected > I0906 20:02:07.263937 29073 hierarchical.cpp:1645] No inverse offers to send > out! > I0906 20:02:07.263902 29071 master.cpp:2725] Disconnecting framework > 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 (default) at > scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 > I0906 20:02:07.264065 29071 master.cpp:2749] Deactivating framework > 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 (default) at > scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 > I0906 20:02:07.264094 29073 hierarchical.cpp:1194] Performed allocation for 0 > agents in 21.025474006secs > I0906 20:02:07.264175 29071 master.cpp:1301] Giving framework > 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 (default) at > scheduler-d55313b3-c4cf-4517-843c-56aa3f74d9f7@172.17.0.3:60366 0ns to > failover > I0906 20:02:07.264307 29073 hierarchical.cpp:380] Deactivated framework > 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 > I0906 20:02:07.264336 29071 master.cpp:1096] Master terminating > *** Aborted at 1473192127 (unix time) try "date -d @1473192127" if you are > using GNU date *** > PC: @ 0x2ac83e9ac40b (unknown) > *** SIGSEGV (@0x2ac880049000) received by PID 29051 (TID 0x2ac848bc4700) from > PID 18446744071562366976; stack trace: *** > @ 0x2ac8947d62c7 (unknown) > @ 0x2ac8947da5a9 (unknown) > I0906 20:02:07.269142 29085 hierarchical.cpp:331] Removed framework > 9fd91e5d-4257-427d-a7da-3f18d99c8ffa-0000 > @ 0x2ac83f13f330 (unknown) > @ 0x2ac83e9ac40b (unknown) > @ 0x2ac83e9a3c05 (unknown) > I0906 20:02:07.274950 29051 cluster.cpp:157] Creating default 'local' > authorizer > @ 0x2ac83d042c98 process::operator<<() > I0906 20:02:07.277822 29051 leveldb.cpp:174] Opened db in 2.422111ms > I0906 20:02:07.279304 29051 leveldb.cpp:181] Compacted db in 1.434065ms > I0906 20:02:07.279400 29051 leveldb.cpp:196] Created db iterator in 26692ns > I0906 20:02:07.279427 29051 leveldb.cpp:202] Seeked to beginning of db in > 2257ns > I0906 20:02:07.279448 29051 leveldb.cpp:271] Iterated through 0 keys in the > db in 362ns > I0906 20:02:07.279505 29051 replica.cpp:776] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0906 20:02:07.280604 29079 recover.cpp:451] Starting replica recovery > I0906 20:02:07.281153 29079 recover.cpp:477] Replica is in EMPTY status > I0906 20:02:07.282649 29071 replica.cpp:673] Replica in EMPTY status received > a broadcasted recover request from __req_res__(6365)@172.17.0.3:60366 > I0906 20:02:07.283185 29076 recover.cpp:197] Received a recover response from > a replica in EMPTY status > I0906 20:02:07.283640 29070 recover.cpp:568] Updating replica status to > STARTING > I0906 20:02:07.284180 29071 master.cpp:379] Master > f6076bbd-3be2-4c01-b593-d50e2743a2c9 (0a1dc2da838b) started on > 172.17.0.3:60366 > I0906 20:02:07.284554 29075 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 654887ns > I0906 20:02:07.284205 29071 master.cpp:381] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/WfTwZm/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --quiet="false" --recovery_agent_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="100secs" --registry_strict="true" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-1.1.0/_inst/share/mesos/webui" > --work_dir="/tmp/WfTwZm/master" --zk_session_timeout="10secs" > I0906 20:02:07.284587 29075 replica.cpp:320] Persisted replica status to > STARTING > I0906 20:02:07.284613 29071 master.cpp:431] Master only allowing > authenticated frameworks to register > I0906 20:02:07.284627 29071 master.cpp:445] Master only allowing > authenticated agents to register > I0906 20:02:07.284636 29071 master.cpp:458] Master only allowing > authenticated HTTP frameworks to register > I0906 20:02:07.284644 29071 credentials.hpp:37] Loading credentials for > authentication from '/tmp/WfTwZm/credentials' > I0906 20:02:07.284814 29078 recover.cpp:477] Replica is in STARTING status > I0906 20:02:07.284943 29071 master.cpp:503] Using default 'crammd5' > authenticator > I0906 20:02:07.285138 29071 http.cpp:883] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0906 20:02:07.285303 29071 http.cpp:883] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0906 20:02:07.285500 29071 http.cpp:883] Using default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0906 20:02:07.285640 29071 master.cpp:583] Authorization enabled > I0906 20:02:07.285848 29072 whitelist_watcher.cpp:77] No whitelist given > I0906 20:02:07.286067 29083 hierarchical.cpp:149] Initialized hierarchical > allocator process > I0906 20:02:07.286173 29073 replica.cpp:673] Replica in STARTING status > received a broadcasted recover request from __req_res__(6366)@172.17.0.3:60366 > I0906 20:02:07.286520 29082 recover.cpp:197] Received a recover response from > a replica in STARTING status > I0906 20:02:07.287076 29073 recover.cpp:568] Updating replica status to VOTING > I0906 20:02:07.287904 29084 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 597451ns > I0906 20:02:07.287938 29084 replica.cpp:320] Persisted replica status to > VOTING > I0906 20:02:07.288169 29076 recover.cpp:582] Successfully joined the Paxos > group > I0906 20:02:07.288481 29076 recover.cpp:466] Recover process terminated > I0906 20:02:07.289659 29084 master.cpp:1850] Elected as the leading master! > I0906 20:02:07.289693 29084 master.cpp:1551] Recovering from registrar > I0906 20:02:07.289862 29079 registrar.cpp:332] Recovering registrar > I0906 20:02:07.290505 29075 log.cpp:553] Attempting to start the writer > I0906 20:02:07.292006 29074 replica.cpp:493] Replica received implicit > promise request from __req_res__(6367)@172.17.0.3:60366 with proposal 1 > @ 0x2ac83c44fac3 mesos::internal::slave::Slave::authenticate() > I0906 20:02:07.292558 29074 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 508694ns > I0906 20:02:07.292584 29074 replica.cpp:342] Persisted promised to 1 > I0906 20:02:07.293391 29080 coordinator.cpp:238] Coordinator attempting to > fill missing positions > I0906 20:02:07.294734 29073 replica.cpp:388] Replica received explicit > promise request from __req_res__(6368)@172.17.0.3:60366 for position 0 with > proposal 2 > I0906 20:02:07.295254 29073 leveldb.cpp:341] Persisting action (8 bytes) to > leveldb took 472361ns > I0906 20:02:07.295285 29073 replica.cpp:708] Persisted action NOP at position > 0 > I0906 20:02:07.296751 29076 replica.cpp:537] Replica received write request > for position 0 from __req_res__(6369)@172.17.0.3:60366 > I0906 20:02:07.296835 29076 leveldb.cpp:436] Reading position from leveldb > took 39744ns > I0906 20:02:07.297452 29076 leveldb.cpp:341] Persisting action (14 bytes) to > leveldb took 554740ns > I0906 20:02:07.297485 29076 replica.cpp:708] Persisted action NOP at position > 0 > I0906 20:02:07.298262 29083 replica.cpp:691] Replica received learned notice > for position 0 from @0.0.0.0:0 > I0906 20:02:07.298765 29083 leveldb.cpp:341] Persisting action (16 bytes) to > leveldb took 460819ns > I0906 20:02:07.298796 29083 replica.cpp:708] Persisted action NOP at position > 0 > @ 0x2ac83c44f56b mesos::internal::slave::Slave::detected() > I0906 20:02:07.299576 29085 log.cpp:569] Writer started with ending position 0 > I0906 20:02:07.300812 29071 leveldb.cpp:436] Reading position from leveldb > took 31797ns > I0906 20:02:07.301996 29073 registrar.cpp:365] Successfully fetched the > registry (0B) in 12.048896ms > I0906 20:02:07.302140 29073 registrar.cpp:464] Applied 1 operations in > 32924ns; attempting to update the registry > I0906 20:02:07.303042 29078 log.cpp:577] Attempting to append 168 bytes to > the log > I0906 20:02:07.303190 29079 coordinator.cpp:348] Coordinator attempting to > write APPEND action at position 1 > @ 0x2ac83c4a5d03 > _ZZN7process8dispatchIN5mesos8internal5slave5SlaveERKNS_6FutureI6OptionINS1_10MasterInfoEEEES9_EEvRKNS_3PIDIT_EEMSD_FvT0_ET1_ENKUlPNS_11ProcessBaseEE_clESM_ > I0906 20:02:07.304149 29076 replica.cpp:537] Replica received write request > for position 1 from __req_res__(6370)@172.17.0.3:60366 > I0906 20:02:07.304754 29076 leveldb.cpp:341] Persisting action (187 bytes) to > leveldb took 546211ns > I0906 20:02:07.304786 29076 replica.cpp:708] Persisted action APPEND at > position 1 > I0906 20:02:07.305613 29078 replica.cpp:691] Replica received learned notice > for position 1 from @0.0.0.0:0 > I0906 20:02:07.306145 29078 leveldb.cpp:341] Persisting action (189 bytes) to > leveldb took 490605ns > I0906 20:02:07.306182 29078 replica.cpp:708] Persisted action APPEND at > position 1 > I0906 20:02:07.307394 29070 registrar.cpp:509] Successfully updated the > registry in 5.172736ms > I0906 20:02:07.307579 29070 registrar.cpp:395] Successfully recovered > registrar > I0906 20:02:07.307659 29085 log.cpp:596] Attempting to truncate the log to 1 > I0906 20:02:07.307802 29073 coordinator.cpp:348] Coordinator attempting to > write TRUNCATE action at position 2 > I0906 20:02:07.308280 29072 master.cpp:1659] Recovered 0 agents from the > registry (129B); allowing 10mins for agents to re-register > I0906 20:02:07.308377 29085 hierarchical.cpp:176] Skipping recovery of > hierarchical allocator: nothing to recover > I0906 20:02:07.309029 29073 replica.cpp:537] Replica received write request > for position 2 from __req_res__(6371)@172.17.0.3:60366 > I0906 20:02:07.309675 29073 leveldb.cpp:341] Persisting action (16 bytes) to > leveldb took 528589ns > I0906 20:02:07.309706 29073 replica.cpp:708] Persisted action TRUNCATE at > position 2 > I0906 20:02:07.310412 29082 replica.cpp:691] Replica received learned notice > for position 2 from @0.0.0.0:0 > I0906 20:02:07.310714 29082 leveldb.cpp:341] Persisting action (18 bytes) to > leveldb took 272545ns > I0906 20:02:07.310772 29082 leveldb.cpp:399] Deleting ~1 keys from leveldb > took 33082ns > I0906 20:02:07.310802 29082 replica.cpp:708] Persisted action TRUNCATE at > position 2 > @ 0x2ac83c4d821e > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal5slave5SlaveERKNS0_6FutureI6OptionINS5_10MasterInfoEEEESD_EEvRKNS0_3PIDIT_EEMSH_FvT0_ET1_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x2ac83d085c43 std::function<>::operator()() > @ 0x2ac83d068bcb process::ProcessBase::visit() > @ 0x2ac83d070fe0 process::DispatchEvent::visit() > @ 0xa196b2 process::ProcessBase::serve() > @ 0x2ac83d064ec0 process::ProcessManager::resume() > @ 0x2ac83d061b2d > _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv > @ 0x2ac83d070788 > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE > @ 0x2ac83d0706df > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv > @ 0x2ac83d070678 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv > @ 0x2ac83e9c0a60 (unknown) > @ 0x2ac83f137184 start_thread > @ 0x2ac83f44737d (unknown) > make[4]: *** [check-local] Segmentation fault > {code} > It looks like the framework disconnects and the master shuts down prematurely. > Attached is the full log from the CI run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)