James Peach created MESOS-9361:
----------------------------------

             Summary: CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively always 
fails.
                 Key: MESOS-9361
                 URL: https://issues.apache.org/jira/browse/MESOS-9361
             Project: Mesos
          Issue Type: Bug
          Components: flaky, test
            Reporter: James Peach


On Fedora 28:

 

 {noformat}
[ RUN      ] CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively
I1029 09:38:31.866564 31397 cgroups.cpp:2838] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos_test_62e0c540-832e-4601-8658-7faa25c427ce
I1029 09:38:31.867048 31398 cgroups.cpp:1229] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos_test_62e0c540-832e-4601-8658-7faa25c427ce after 
359936ns
I1029 09:38:31.869033 31397 cgroups.cpp:2856] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos_test_62e0c540-832e-4601-8658-7faa25c427ce
I1029 09:38:31.869357 31403 cgroups.cpp:1258] Successfully thawed cgroup 
/sys/fs/cgroup/freezer/mesos_test_62e0c540-832e-4601-8658-7faa25c427ce after 
261888ns
I1029 09:38:31.884752 31382 cluster.cpp:173] Creating default 'local' authorizer
I1029 09:38:31.892966 31397 master.cpp:413] Master 
0b04a175-fe62-41a1-a387-8d679d1d9609 (jpeach.scv.apple.com) started on 
17.228.8.72:42153
I1029 09:38:31.892992 31397 master.cpp:416] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="hierarchical" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/mFB69h/credentials" --filter_gpu_resources="true" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --memory_profiling="false" 
--min_allocatable_resources="cpus:0.01|mem:32" --port="5050" 
--publish_per_framework_metrics="true" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--require_agent_domain="false" --role_sorter="drf" --root_submissions="true" 
--version="false" --webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/mFB69h/master" --zk_session_timeout="10secs"
I1029 09:38:31.893931 31397 master.cpp:465] Master only allowing authenticated 
frameworks to register
I1029 09:38:31.893942 31397 master.cpp:471] Master only allowing authenticated 
agents to register
I1029 09:38:31.893951 31397 master.cpp:477] Master only allowing authenticated 
HTTP frameworks to register
I1029 09:38:31.893962 31397 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/mFB69h/credentials'
I1029 09:38:31.894204 31397 master.cpp:521] Using default 'crammd5' 
authenticator
I1029 09:38:31.894359 31397 authenticator.cpp:520] Initializing server SASL
I1029 09:38:31.898878 31397 auxprop.cpp:73] Initialized in-memory auxiliary 
property plugin
I1029 09:38:31.898983 31397 http.cpp:1038] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I1029 09:38:31.899279 31397 http.cpp:1038] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I1029 09:38:31.899395 31397 http.cpp:1038] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I1029 09:38:31.899507 31397 master.cpp:602] Authorization enabled
I1029 09:38:31.900339 31406 whitelist_watcher.cpp:77] No whitelist given
I1029 09:38:31.900434 31400 hierarchical.cpp:175] Initialized hierarchical 
allocator process
I1029 09:38:31.908254 31403 master.cpp:2105] Elected as the leading master!
I1029 09:38:31.908313 31403 master.cpp:1660] Recovering from registrar
I1029 09:38:31.908717 31404 registrar.cpp:339] Recovering registrar
I1029 09:38:31.910310 31400 registrar.cpp:383] Successfully fetched the 
registry (0B) in 1.547776ms
I1029 09:38:31.910684 31400 registrar.cpp:487] Applied 1 operations in 
150793ns; attempting to update the registry
I1029 09:38:31.913811 31400 registrar.cpp:544] Successfully updated the 
registry in 2.979072ms
I1029 09:38:31.914028 31400 registrar.cpp:416] Successfully recovered registrar
I1029 09:38:31.914872 31398 master.cpp:1774] Recovered 0 agents from the 
registry (154B); allowing 10mins for agents to reregister
I1029 09:38:31.914912 31406 hierarchical.cpp:215] Skipping recovery of 
hierarchical allocator: nothing to recover
I1029 09:38:31.920753 31382 containerizer.cpp:305] Using isolation { 
network/cni, filesystem/posix, environment_secret, cgroups/mem }
I1029 09:38:31.926185 31382 linux_launcher.cpp:144] Using 
/sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I1029 09:38:31.927129 31382 provisioner.cpp:298] Using default backend 'overlay'
W1029 09:38:31.942937 31382 process.cpp:2829] Attempted to spawn already 
running process files@17.228.8.72:42153
I1029 09:38:31.943821 31382 cluster.cpp:485] Creating default 'local' authorizer
I1029 09:38:31.946377 31402 slave.cpp:267] Mesos agent started on 
(1)@17.228.8.72:42153
I1029 09:38:31.946439 31402 slave.cpp:268] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://"; 
--appc_store_dir="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/store/appc"
 --authenticate_http_readonly="true" --authenticate_http_readwrite="false" 
--authenticatee="crammd5" --authentication_backoff_factor="1secs" 
--authentication_timeout_max="1mins" --authentication_timeout_min="5secs" 
--authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" 
--cgroups_destroy_timeout="1mins" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos_test_0ace7d1c-d155-43db-b4f8-df1d91ce4270" 
--container_disk_watch_interval="15secs" --containerizers="mesos" 
--credential="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/credential"
 --default_role="*" --disallow_sharing_agent_pid_namespace="false" 
--disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" 
--docker_registry="https://registry-1.docker.io"; --docker_remove_delay="6hrs" 
--docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" 
--docker_store_dir="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/store/docker"
 --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins" 
--executor_reregistration_timeout="2secs" 
--executor_shutdown_grace_period="5secs" 
--fetcher_cache_dir="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/fetch"
 --fetcher_cache_size="2GB" --fetcher_stall_timeout="1mins" 
--frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" 
--gc_non_executor_container_sandboxes="false" --help="false" 
--hostname_lookup="true" --http_command_executor="false" 
--http_credentials="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/http_credentials"
 --http_heartbeat_interval="30secs" --initialize_driver_logging="true" 
--isolation="cgroups/mem" --launcher="linux" 
--launcher_dir="/home/jpeach/upstream/mesos/build/src" --logbufsecs="0" 
--logging_level="INFO" --max_completed_executors_per_framework="150" 
--memory_profiling="false" --network_cni_metrics="true" 
--oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
--quiet="false" --reconfiguration_policy="equal" --recover="reconnect" 
--recovery_timeout="15mins" --registration_backoff_factor="10ms" 
--resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]" 
--revocable_cpu_low_priority="true" 
--runtime_dir="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N" 
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" 
--systemd_enable_support="true" 
--systemd_runtime_directory="/run/systemd/system" --version="false" 
--work_dir="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_BEPq5x" 
--xfs_kill_containers="false" --xfs_project_range="[5000-10000]" 
--zk_session_timeout="10secs"
I1029 09:38:31.946923 31402 credentials.hpp:86] Loading credential for 
authentication from 
'/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/credential'
I1029 09:38:31.947036 31402 slave.cpp:300] Agent using credential for: 
test-principal
I1029 09:38:31.947049 31402 credentials.hpp:37] Loading credentials for 
authentication from 
'/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/http_credentials'
I1029 09:38:31.947134 31402 http.cpp:1038] Creating default 'basic' HTTP 
authenticator for realm 'mesos-agent-readonly'
I1029 09:38:31.947352 31402 disk_profile_adaptor.cpp:80] Creating default disk 
profile adaptor module
I1029 09:38:31.949756 31402 slave.cpp:615] Agent resources: 
[{"name":"cpus","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
I1029 09:38:31.949865 31402 slave.cpp:623] Agent attributes: [  ]
I1029 09:38:31.949882 31402 slave.cpp:632] Agent hostname: jpeach.scv.apple.com
I1029 09:38:31.950031 31401 task_status_update_manager.cpp:181] Pausing sending 
task status updates
I1029 09:38:31.951694 31403 state.cpp:66] Recovering state from 
'/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_BEPq5x/meta'
I1029 09:38:31.951923 31402 slave.cpp:6915] Finished recovering checkpointed 
state from 
'/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_BEPq5x/meta', 
beginning agent recovery
I1029 09:38:31.952055 31404 task_status_update_manager.cpp:207] Recovering task 
status update manager
I1029 09:38:31.952318 31405 containerizer.cpp:727] Recovering Mesos containers
I1029 09:38:31.952589 31401 linux_launcher.cpp:286] Recovering Linux launcher
I1029 09:38:31.953091 31396 containerizer.cpp:1053] Recovering isolators
E1029 09:38:31.954258 31405 slave.cpp:7275] EXIT with status 1: Failed to 
perform recovery: Collect failed: Collect failed: Failed to list cgroups under 
'/sys/fs/cgroup/memory': Failed to determine canonical path of 
'/sys/fs/cgroup/memory/mesos_test_0ace7d1c-d155-43db-b4f8-df1d91ce4270': No 
such file or directory
If recovery failed due to a change in configuration and you want to
keep the current agent id, you might want to change the
`--reconfiguration_policy` flag to a more permissive value.

To restart this agent with a new agent id instead, do as follows:
rm -f 
/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_BEPq5x/meta/slaves/latest
This ensures that the agent does not recover old live executors.

If you use the Docker containerizer and think that the Docker
daemon state is broken, you can try to clear it. But be careful:
these commands will erase all containers and images from this host,
not just those started by Mesos!
docker kill $(docker ps -q)
docker rm $(docker ps -a -q)
docker rmi $(docker images -q)

Finally, restart the agent.

../../3rdparty/libprocess/include/process/gmock.hpp:247: ERROR: this mock 
object (used in test CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively) should 
be deleted but never is. Its address is @0x56491dd05de8.
../../src/tests/mock_registrar.cpp:54: ERROR: this mock object (used in test 
CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively) should be deleted but never 
is. Its address is @0x56491e267c00.
../../src/tests/containerizer/cgroups_isolator_tests.cpp:737: ERROR: this mock 
object (used in test CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively) should 
be deleted but never is. Its address is @0x7ffca24acd10.
ERROR: 3 leaked mock objects found at program exit.
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to