[jira] [Commented] (MESOS-9928) OperationReconciliationTest.FrameworkReconciliationRaceWithUpdateSlaveMessage is severely flaky
[ https://issues.apache.org/jira/browse/MESOS-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909073#comment-16909073 ] Benno Evers commented on MESOS-9928: https://reviews.apache.org/r/71297/ > OperationReconciliationTest.FrameworkReconciliationRaceWithUpdateSlaveMessage > is severely flaky > --- > > Key: MESOS-9928 > URL: https://issues.apache.org/jira/browse/MESOS-9928 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: Andrei Sekretenko >Assignee: Benno Evers >Priority: Major > Labels: flaky, flaky-test, foundations > > Flakes are frequently observed in the internal CI. > Example: > {code} > [ RUN ] > ContentType/OperationReconciliationTest.FrameworkReconciliationRaceWithUpdateSlaveMessage/1 > I0806 20:00:24.128456 29945 cluster.cpp:177] Creating default 'local' > authorizer > I0806 20:00:24.132164 21364 master.cpp:440] Master > 7bbcb55d-ce3b-40e6-a605-62ed7d843832 (ip-172-16-10-6.ec2.internal) started on > 172.16.10.6:36902 > I0806 20:00:24.132181 21364 master.cpp:443] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="hierarchical" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/MpmzC4/credentials" --filter_gpu_resources="true" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_operator_event_stream_subscribers="1000" > --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" > --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" > --publish_per_framework_metrics="true" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --require_agent_domain="false" --role_sorter="drf" --root_submissions="true" > --version="false" --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/tmp/MpmzC4/master" --zk_session_timeout="10secs" > I0806 20:00:24.132485 21364 master.cpp:492] Master only allowing > authenticated frameworks to register > I0806 20:00:24.132494 21364 master.cpp:498] Master only allowing > authenticated agents to register > I0806 20:00:24.132500 21364 master.cpp:504] Master only allowing > authenticated HTTP frameworks to register > I0806 20:00:24.132506 21364 credentials.hpp:37] Loading credentials for > authentication from '/tmp/MpmzC4/credentials' > I0806 20:00:24.132709 21364 master.cpp:548] Using default 'crammd5' > authenticator > I0806 20:00:24.132845 21364 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0806 20:00:24.132975 21364 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0806 20:00:24.133085 21364 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0806 20:00:24.133188 21364 master.cpp:629] Authorization enabled > I0806 20:00:24.135308 21363 whitelist_watcher.cpp:77] No whitelist given > I0806 20:00:24.139948 21364 master.cpp:2168] Elected as the leading master! > I0806 20:00:24.139968 21364 master.cpp:1664] Recovering from registrar > I0806 20:00:24.140195 21364 registrar.cpp:339] Recovering registrar > I0806 20:00:24.141042 21364 registrar.cpp:383] Successfully fetched the > registry (0B) in 0ns > I0806 20:00:24.141141 21364 registrar.cpp:487] Applied 1 operations in > 25620ns; attempting to update the registry > I0806 20:00:24.141793 21364 registrar.cpp:544] Successfully updated the > registry in 0ns > I0806 20:00:24.141894 21364 registrar.cpp:416] Successfully recovered > registrar > I0806 20:00:24.142277 21364 master.cpp:1817] Recovered 0 agents from the > registry (175B); allowing 10mins for agents to reregister > I0806 20:00:24.142611 21366 hierarchical.cpp:241] Initialized hierarchical > allocator process > I0806 20:00:24.142735 21366 hierarchical.cpp:280] Skipping recovery of > hierarchical allocator: nothing to recover > W0806 20:00:24.147953 29945 process.cpp:2877] Attempted to spawn already > runn
[jira] [Commented] (MESOS-9928) /OperationReconciliationTest.FrameworkReconciliationRaceWithUpdateSlaveMessage is severely flaky
[ https://issues.apache.org/jira/browse/MESOS-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902010#comment-16902010 ] Andrei Sekretenko commented on MESOS-9928: -- {code:java} W0806 20:00:24.354037 21365 scheduler.cpp:568] Dropping SUBSCRIBE: Scheduler is in state CONNECTING {code} This flake looks simple: the test does not wait for the scheduler to reconnect before trying to resubscribe. > /OperationReconciliationTest.FrameworkReconciliationRaceWithUpdateSlaveMessage > is severely flaky > > > Key: MESOS-9928 > URL: https://issues.apache.org/jira/browse/MESOS-9928 > Project: Mesos > Issue Type: Bug >Reporter: Andrei Sekretenko >Priority: Major > Labels: flaky, flaky-test, foundations > > Flakes are frequently observed in the internal CI. > Example: > {code} > [ RUN ] > ContentType/OperationReconciliationTest.FrameworkReconciliationRaceWithUpdateSlaveMessage/1 > I0806 20:00:24.128456 29945 cluster.cpp:177] Creating default 'local' > authorizer > I0806 20:00:24.132164 21364 master.cpp:440] Master > 7bbcb55d-ce3b-40e6-a605-62ed7d843832 (ip-172-16-10-6.ec2.internal) started on > 172.16.10.6:36902 > I0806 20:00:24.132181 21364 master.cpp:443] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="hierarchical" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/MpmzC4/credentials" --filter_gpu_resources="true" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_operator_event_stream_subscribers="1000" > --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" > --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" > --publish_per_framework_metrics="true" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --require_agent_domain="false" --role_sorter="drf" --root_submissions="true" > --version="false" --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/tmp/MpmzC4/master" --zk_session_timeout="10secs" > I0806 20:00:24.132485 21364 master.cpp:492] Master only allowing > authenticated frameworks to register > I0806 20:00:24.132494 21364 master.cpp:498] Master only allowing > authenticated agents to register > I0806 20:00:24.132500 21364 master.cpp:504] Master only allowing > authenticated HTTP frameworks to register > I0806 20:00:24.132506 21364 credentials.hpp:37] Loading credentials for > authentication from '/tmp/MpmzC4/credentials' > I0806 20:00:24.132709 21364 master.cpp:548] Using default 'crammd5' > authenticator > I0806 20:00:24.132845 21364 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0806 20:00:24.132975 21364 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0806 20:00:24.133085 21364 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0806 20:00:24.133188 21364 master.cpp:629] Authorization enabled > I0806 20:00:24.135308 21363 whitelist_watcher.cpp:77] No whitelist given > I0806 20:00:24.139948 21364 master.cpp:2168] Elected as the leading master! > I0806 20:00:24.139968 21364 master.cpp:1664] Recovering from registrar > I0806 20:00:24.140195 21364 registrar.cpp:339] Recovering registrar > I0806 20:00:24.141042 21364 registrar.cpp:383] Successfully fetched the > registry (0B) in 0ns > I0806 20:00:24.141141 21364 registrar.cpp:487] Applied 1 operations in > 25620ns; attempting to update the registry > I0806 20:00:24.141793 21364 registrar.cpp:544] Successfully updated the > registry in 0ns > I0806 20:00:24.141894 21364 registrar.cpp:416] Successfully recovered > registrar > I0806 20:00:24.142277 21364 master.cpp:1817] Recovered 0 agents from the > registry (175B); allowing 10mins for agents to reregister > I0806 20:00:24.142611 21366 hierarchical.cpp:241] Initialized hierarchical > allocator process > I0806 20:00:24.142735 21366 hierarchical.cpp:280] Skip