[ https://issues.apache.org/jira/browse/MESOS-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benjamin Mahler updated MESOS-7729: ----------------------------------- Fix Version/s: 1.5.0 > ExamplesTest.DynamicReservationFramework is flaky > ------------------------------------------------- > > Key: MESOS-7729 > URL: https://issues.apache.org/jira/browse/MESOS-7729 > Project: Mesos > Issue Type: Bug > Reporter: Vinod Kone > Labels: flaky-test, mesosphere-oncall > Fix For: 1.5.0 > > > Observed this on ASF CI > {code} > [ RUN ] ExamplesTest.DynamicReservationFramework > Using temporary directory > '/tmp/ExamplesTest_DynamicReservationFramework_uPVIaN' > /mesos/mesos-1.4.0/src/tests/dynamic_reservation_framework_test.sh: line 19: > /mesos/mesos-1.4.0/_build/src/colors.sh: No such file or directory > /mesos/mesos-1.4.0/src/tests/dynamic_reservation_framework_test.sh: line 20: > /mesos/mesos-1.4.0/_build/src/atexit.sh: No such file or directory > WARNING: Logging before InitGoogleLogging() is written to STDERR > I0627 16:04:20.661948 8847 process.cpp:1282] libprocess is initialized on > 172.17.0.3:37113 with 16 worker threads > I0627 16:04:20.662199 8847 logging.cpp:199] Logging to STDERR > I0627 16:04:20.674317 8847 leveldb.cpp:174] Opened db in 3.343216ms > I0627 16:04:20.675655 8847 leveldb.cpp:181] Compacted db in 1.264481ms > I0627 16:04:20.675797 8847 leveldb.cpp:196] Created db iterator in 89655ns > I0627 16:04:20.675829 8847 leveldb.cpp:202] Seeked to beginning of db in > 5551ns > I0627 16:04:20.675848 8847 leveldb.cpp:271] Iterated through 0 keys in the > db in 1133ns > I0627 16:04:20.676103 8847 replica.cpp:779] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0627 16:04:20.680465 8873 recover.cpp:451] Starting replica recovery > I0627 16:04:20.681649 8873 recover.cpp:477] Replica is in EMPTY status > I0627 16:04:20.682160 8847 local.cpp:272] Creating default 'local' authorizer > I0627 16:04:20.684504 8884 replica.cpp:676] Replica in EMPTY status received > a broadcasted recover request from __req_res__(1)@172.17.0.3:37113 > I0627 16:04:20.685750 8882 recover.cpp:197] Received a recover response from > a replica in EMPTY status > I0627 16:04:20.686617 8877 recover.cpp:568] Updating replica status to > STARTING > I0627 16:04:20.688508 8877 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 741914ns > I0627 16:04:20.688544 8881 master.cpp:438] Master > c1c3a180-5bd3-42fa-b84e-e2c30aba7364 (089cb2cc2625) started on > 172.17.0.3:37113 > I0627 16:04:20.688551 8877 replica.cpp:322] Persisted replica status to > STARTING > I0627 16:04:20.689095 8878 recover.cpp:477] Replica is in STARTING status > I0627 16:04:20.688582 8881 master.cpp:440] Flags at startup: > --acls="permissive: true > register_frameworks { > principals { > type: ANY > } > roles { > type: SOME > values: "test" > } > } > " --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="false" --authenticate_frameworks="false" > --authenticate_http_frameworks="false" --authenticate_http_readonly="false" > --authenticate_http_readwrite="false" --authenticators="crammd5" > --authorizers="local" > --credentials="/tmp/ExamplesTest_DynamicReservationFramework_uPVIaN/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="replicated_log" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="20secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-1.4.0/src/webui" > --work_dir="/tmp/mesos-9H3Est/master" --zk_session_timeout="10secs" > I0627 16:04:20.689460 8881 master.cpp:492] Master allowing unauthenticated > frameworks to register > I0627 16:04:20.689476 8881 master.cpp:506] Master allowing unauthenticated > agents to register > I0627 16:04:20.689482 8881 master.cpp:520] Master allowing HTTP frameworks > to register without authentication > I0627 16:04:20.689494 8881 credentials.hpp:37] Loading credentials for > authentication from > '/tmp/ExamplesTest_DynamicReservationFramework_uPVIaN/credentials' > W0627 16:04:20.689620 8881 credentials.hpp:52] Permissions on credentials > file '/tmp/ExamplesTest_DynamicReservationFramework_uPVIaN/credentials' are > too open; it is recommended that your credentials file is NOT accessible by > others > I0627 16:04:20.689817 8881 master.cpp:562] Using default 'crammd5' > authenticator > I0627 16:04:20.690021 8881 authenticator.cpp:520] Initializing server SASL > I0627 16:04:20.690615 8878 replica.cpp:676] Replica in STARTING status > received a broadcasted recover request from __req_res__(2)@172.17.0.3:37113 > I0627 16:04:20.691058 8877 recover.cpp:197] Received a recover response from > a replica in STARTING status > I0627 16:04:20.691951 8885 recover.cpp:568] Updating replica status to VOTING > I0627 16:04:20.692548 8881 auxprop.cpp:73] Initialized in-memory auxiliary > property plugin > I0627 16:04:20.692679 8881 master.cpp:642] Authorization enabled > I0627 16:04:20.692991 8872 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 525491ns > I0627 16:04:20.693245 8872 replica.cpp:322] Persisted replica status to > VOTING > I0627 16:04:20.693567 8882 recover.cpp:582] Successfully joined the Paxos > group > I0627 16:04:20.693084 8876 hierarchical.cpp:169] Initialized hierarchical > allocator process > I0627 16:04:20.693971 8882 recover.cpp:466] Recover process terminated > I0627 16:04:20.693136 8885 whitelist_watcher.cpp:77] No whitelist given > I0627 16:04:20.695565 8847 resolver.cpp:69] Creating default secret resolver > I0627 16:04:20.696056 8847 containerizer.cpp:230] Using isolation: > filesystem/posix,posix/cpu,posix/mem,network/cni,environment_secret > W0627 16:04:20.697029 8847 backend.cpp:76] Failed to create 'aufs' backend: > AufsBackend requires root privileges > W0627 16:04:20.697257 8847 backend.cpp:76] Failed to create 'bind' backend: > BindBackend requires root privileges > I0627 16:04:20.697340 8847 provisioner.cpp:255] Using default backend 'copy' > I0627 16:04:20.702487 8878 slave.cpp:249] Mesos agent started on > (1)@172.17.0.3:37113 > I0627 16:04:20.702535 8878 slave.cpp:250] Flags at startup: > --acls="permissive: true > register_frameworks { > principals { > type: ANY > } > roles { > type: SOME > values: "test" > } > } > " --appc_simple_discovery_uri_prefix="http://" > --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" > --authenticate_http_readwrite="false" --authenticatee="crammd5" > --authentication_backoff_factor="1secs" --authorizer="local" > --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" > --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" > --cgroups_root="mesos" --container_disk_watch_interval="15secs" > --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" > --docker="docker" --docker_kill_orphans="true" > --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" > --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" > --docker_store_dir="/tmp/mesos/store/docker" > --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" > --enforce_container_disk_quota="false" > --executor_registration_timeout="1mins" > --executor_reregistration_timeout="2secs" > --executor_shutdown_grace_period="5secs" > --fetcher_cache_dir="/tmp/mesos-9H3Est/agents/0/fetch" > --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" > --gc_disk_headroom="0.1" --hadoop_home="" --help="false" > --hostname_lookup="true" --http_command_executor="false" > --http_heartbeat_interval="30secs" --initialize_driver_logging="true" > --isolation="filesystem/posix,posix/cpu,posix/mem" --launcher="posix" > --launcher_dir="/mesos/mesos-1.4.0/_build/src" --logbufsecs="0" > --logging_level="INFO" --max_completed_executors_per_framework="150" > --oversubscribed_resources_interval="15secs" --perf_duration="10secs" > --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" > --quiet="false" --recover="reconnect" --recovery_timeout="15mins" > --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" > --runtime_dir="/tmp/mesos-9H3Est/agents/0/run" > --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" > --systemd_enable_support="true" > --systemd_runtime_directory="/run/systemd/system" --version="false" > --work_dir="/tmp/mesos-9H3Est/agents/0/work" > I0627 16:04:20.706677 8847 resolver.cpp:69] Creating default secret resolver > I0627 16:04:20.707231 8847 containerizer.cpp:230] Using isolation: > filesystem/posix,posix/cpu,posix/mem,network/cni,environment_secret > W0627 16:04:20.708060 8847 backend.cpp:76] Failed to create 'aufs' backend: > AufsBackend requires root privileges > W0627 16:04:20.708179 8847 backend.cpp:76] Failed to create 'bind' backend: > BindBackend requires root privileges > I0627 16:04:20.708231 8847 provisioner.cpp:255] Using default backend 'copy' > I0627 16:04:20.710345 8881 master.cpp:2161] Elected as the leading master! > I0627 16:04:20.710400 8881 master.cpp:1700] Recovering from registrar > I0627 16:04:20.710644 8886 registrar.cpp:345] Recovering registrar > I0627 16:04:20.711997 8881 log.cpp:536] Attempting to start the writer > I0627 16:04:20.714257 8880 slave.cpp:249] Mesos agent started on > (2)@172.17.0.3:37113 > I0627 16:04:20.714516 8880 slave.cpp:250] Flags at startup: > --acls="permissive: true > register_frameworks { > principals { > type: ANY > } > roles { > type: SOME > values: "test" > } > } > " --appc_simple_discovery_uri_prefix="http://" > --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" > --authenticate_http_readwrite="false" --authenticatee="crammd5" > --authentication_backoff_factor="1secs" --authorizer="local" > --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" > --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" > --cgroups_root="mesos" --container_disk_watch_interval="15secs" > --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" > --docker="docker" --docker_kill_orphans="true" > --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" > --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" > --docker_store_dir="/tmp/mesos/store/docker" > --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" > --enforce_container_disk_quota="false" > --executor_registration_timeout="1mins" > --executor_reregistration_timeout="2secs" > --executor_shutdown_grace_period="5secs" > --fetcher_cache_dir="/tmp/mesos-9H3Est/agents/1/fetch" > --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" > --gc_disk_headroom="0.1" --hadoop_home="" --help="false" > --hostname_lookup="true" --http_command_executor="false" > --http_heartbeat_interval="30secs" --initialize_driver_logging="true" > --isolation="filesystem/posix,posix/cpu,posix/mem" --launcher="posix" > --launcher_dir="/mesos/mesos-1.4.0/_build/src" --logbufsecs="0" > --logging_level="INFO" --max_completed_executors_per_framework="150" > --oversubscribed_resources_interval="15secs" --perf_duration="10secs" > --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" > --quiet="false" --recover="reconnect" --recovery_timeout="15mins" > --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" > --runtime_dir="/tmp/mesos-9H3Est/agents/1/run" > --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" > --systemd_enable_support="true" > --systemd_runtime_directory="/run/systemd/system" --version="false" > --work_dir="/tmp/mesos-9H3Est/agents/1/work" > I0627 16:04:20.715459 8884 replica.cpp:496] Replica received implicit > promise request from __req_res__(3)@172.17.0.3:37113 with proposal 1 > I0627 16:04:20.716228 8884 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 693410ns > I0627 16:04:20.716259 8884 replica.cpp:344] Persisted promised to 1 > I0627 16:04:20.717954 8875 coordinator.cpp:238] Coordinator attempting to > fill missing positions > I0627 16:04:20.718142 8847 resolver.cpp:69] Creating default secret resolver > I0627 16:04:20.718848 8847 containerizer.cpp:230] Using isolation: > filesystem/posix,posix/cpu,posix/mem,network/cni,environment_secret > W0627 16:04:20.719561 8847 backend.cpp:76] Failed to create 'aufs' backend: > AufsBackend requires root privileges > W0627 16:04:20.719844 8847 backend.cpp:76] Failed to create 'bind' backend: > BindBackend requires root privileges > I0627 16:04:20.719885 8847 provisioner.cpp:255] Using default backend 'copy' > I0627 16:04:20.721727 8872 replica.cpp:391] Replica received explicit > promise request from __req_res__(4)@172.17.0.3:37113 for position 0 with > proposal 2 > I0627 16:04:20.722569 8872 leveldb.cpp:341] Persisting action (8 bytes) to > leveldb took 685913ns > I0627 16:04:20.722607 8872 replica.cpp:711] Persisted action NOP at position > 0 > I0627 16:04:20.724098 8872 slave.cpp:249] Mesos agent started on > (3)@172.17.0.3:37113 > I0627 16:04:20.724123 8872 slave.cpp:250] Flags at startup: > --acls="permissive: true > register_frameworks { > principals { > type: ANY > } > roles { > type: SOME > values: "test" > } > } > " --appc_simple_discovery_uri_prefix="http://" > --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" > --authenticate_http_readwrite="false" --authenticatee="crammd5" > --authentication_backoff_factor="1secs" --authorizer="local" > --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" > --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" > --cgroups_root="mesos" --container_disk_watch_interval="15secs" > --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" > --docker="docker" --docker_kill_orphans="true" > --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" > --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" > --docker_store_dir="/tmp/mesos/store/docker" > --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" > --enforce_container_disk_quota="false" > --executor_registration_timeout="1mins" > --executor_reregistration_timeout="2secs" > --executor_shutdown_grace_period="5secs" > --fetcher_cache_dir="/tmp/mesos-9H3Est/agents/2/fetch" > --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" > --gc_disk_headroom="0.1" --hadoop_home="" --help="false" > --hostname_lookup="true" --http_command_executor="false" > --http_heartbeat_interval="30secs" --initialize_driver_logging="true" > --isolation="filesystem/posix,posix/cpu,posix/mem" --launcher="posix" > --launcher_dir="/mesos/mesos-1.4.0/_build/src" --logbufsecs="0" > --logging_level="INFO" --max_completed_executors_per_framework="150" > --oversubscribed_resources_interval="15secs" --perf_duration="10secs" > --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" > --quiet="false" --recover="reconnect" --recovery_timeout="15mins" > --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" > --runtime_dir="/tmp/mesos-9H3Est/agents/2/run" > --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" > --systemd_enable_support="true" > --systemd_runtime_directory="/run/systemd/system" --version="false" > --work_dir="/tmp/mesos-9H3Est/agents/2/work" > I0627 16:04:20.725759 8847 sched.cpp:232] Version: 1.4.0 > I0627 16:04:20.727453 8877 replica.cpp:540] Replica received write request > for position 0 from __req_res__(5)@172.17.0.3:37113 > I0627 16:04:20.727560 8877 leveldb.cpp:436] Reading position from leveldb > took 65766ns > I0627 16:04:20.727771 8884 sched.cpp:336] New master detected at > master@172.17.0.3:37113 > I0627 16:04:20.728091 8877 leveldb.cpp:341] Persisting action (14 bytes) to > leveldb took 432923ns > I0627 16:04:20.728114 8877 replica.cpp:711] Persisted action NOP at position > 0 > I0627 16:04:20.728425 8884 sched.cpp:352] No credentials provided. > Attempting to register without authentication > I0627 16:04:20.728458 8884 sched.cpp:836] Sending SUBSCRIBE call to > master@172.17.0.3:37113 > I0627 16:04:20.729207 8876 replica.cpp:694] Replica received learned notice > for position 0 from @0.0.0.0:0 > I0627 16:04:20.729523 8884 sched.cpp:869] Will retry registration in > 1.548772469secs if necessary > I0627 16:04:20.729578 8887 master.cpp:1523] Dropping 'mesos.scheduler.Call' > message since not recovered yet > I0627 16:04:20.729859 8876 leveldb.cpp:341] Persisting action (16 bytes) to > leveldb took 594682ns > I0627 16:04:20.729890 8876 replica.cpp:711] Persisted action NOP at position > 0 > I0627 16:04:20.730602 8882 log.cpp:552] Writer started with ending position 0 > I0627 16:04:20.704807 8878 slave.cpp:553] Agent resources: > [{"name":"cpus","scalar":{"value":16.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":47270.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":3701220.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}] > I0627 16:04:20.716545 8880 slave.cpp:553] Agent resources: > [{"name":"cpus","scalar":{"value":16.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":47270.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":3701220.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}] > I0627 16:04:20.732960 8880 slave.cpp:561] Agent attributes: [ ] > I0627 16:04:20.733045 8878 slave.cpp:561] Agent attributes: [ ] > I0627 16:04:20.730206 8872 slave.cpp:553] Agent resources: > [{"name":"cpus","scalar":{"value":16.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":47270.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":3701220.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}] > I0627 16:04:20.733058 8880 slave.cpp:566] Agent hostname: 089cb2cc2625 > I0627 16:04:20.733126 8872 slave.cpp:561] Agent attributes: [ ] > I0627 16:04:20.733155 8872 slave.cpp:566] Agent hostname: 089cb2cc2625 > I0627 16:04:20.733086 8878 slave.cpp:566] Agent hostname: 089cb2cc2625 > I0627 16:04:20.733321 8879 leveldb.cpp:436] Reading position from leveldb > took 79459ns > I0627 16:04:20.733381 8887 status_update_manager.cpp:177] Pausing sending > status updates > I0627 16:04:20.733424 8887 status_update_manager.cpp:177] Pausing sending > status updates > I0627 16:04:20.733777 8887 status_update_manager.cpp:177] Pausing sending > status updates > I0627 16:04:20.737633 8885 state.cpp:64] Recovering state from > '/tmp/mesos-9H3Est/agents/2/work/meta' > I0627 16:04:20.738603 8874 status_update_manager.cpp:203] Recovering status > update manager > *** Error in > `/mesos/mesos-1.4.0/_build/src/.libs/lt-dynamic-reservation-framework': > double free or corruption (fasttop): 0x00002b7eac002ae0 *** > ../../src/tests/script.cpp:83: Failure > Failed > dynamic_reservation_framework_test.sh terminated with signal Aborted > [ FAILED ] ExamplesTest.DynamicReservationFramework (2904 ms) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)