[jira] [Commented] (MESOS-9928) OperationReconciliationTest.FrameworkReconciliationRaceWithUpdateSlaveMessage is severely flaky

2019-08-16 Thread Benno Evers (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909073#comment-16909073
 ] 

Benno Evers commented on MESOS-9928:


https://reviews.apache.org/r/71297/

> OperationReconciliationTest.FrameworkReconciliationRaceWithUpdateSlaveMessage 
> is severely flaky
> ---
>
> Key: MESOS-9928
> URL: https://issues.apache.org/jira/browse/MESOS-9928
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Andrei Sekretenko
>Assignee: Benno Evers
>Priority: Major
>  Labels: flaky, flaky-test, foundations
>
> Flakes are frequently observed in the internal CI.
> Example:
> {code}
> [ RUN  ] 
> ContentType/OperationReconciliationTest.FrameworkReconciliationRaceWithUpdateSlaveMessage/1
> I0806 20:00:24.128456 29945 cluster.cpp:177] Creating default 'local' 
> authorizer
> I0806 20:00:24.132164 21364 master.cpp:440] Master 
> 7bbcb55d-ce3b-40e6-a605-62ed7d843832 (ip-172-16-10-6.ec2.internal) started on 
> 172.16.10.6:36902
> I0806 20:00:24.132181 21364 master.cpp:443] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="hierarchical" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/MpmzC4/credentials" --filter_gpu_resources="true" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_operator_event_stream_subscribers="1000" 
> --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" 
> --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" 
> --publish_per_framework_metrics="true" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --require_agent_domain="false" --role_sorter="drf" --root_submissions="true" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/MpmzC4/master" --zk_session_timeout="10secs"
> I0806 20:00:24.132485 21364 master.cpp:492] Master only allowing 
> authenticated frameworks to register
> I0806 20:00:24.132494 21364 master.cpp:498] Master only allowing 
> authenticated agents to register
> I0806 20:00:24.132500 21364 master.cpp:504] Master only allowing 
> authenticated HTTP frameworks to register
> I0806 20:00:24.132506 21364 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/MpmzC4/credentials'
> I0806 20:00:24.132709 21364 master.cpp:548] Using default 'crammd5' 
> authenticator
> I0806 20:00:24.132845 21364 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0806 20:00:24.132975 21364 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0806 20:00:24.133085 21364 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0806 20:00:24.133188 21364 master.cpp:629] Authorization enabled
> I0806 20:00:24.135308 21363 whitelist_watcher.cpp:77] No whitelist given
> I0806 20:00:24.139948 21364 master.cpp:2168] Elected as the leading master!
> I0806 20:00:24.139968 21364 master.cpp:1664] Recovering from registrar
> I0806 20:00:24.140195 21364 registrar.cpp:339] Recovering registrar
> I0806 20:00:24.141042 21364 registrar.cpp:383] Successfully fetched the 
> registry (0B) in 0ns
> I0806 20:00:24.141141 21364 registrar.cpp:487] Applied 1 operations in 
> 25620ns; attempting to update the registry
> I0806 20:00:24.141793 21364 registrar.cpp:544] Successfully updated the 
> registry in 0ns
> I0806 20:00:24.141894 21364 registrar.cpp:416] Successfully recovered 
> registrar
> I0806 20:00:24.142277 21364 master.cpp:1817] Recovered 0 agents from the 
> registry (175B); allowing 10mins for agents to reregister
> I0806 20:00:24.142611 21366 hierarchical.cpp:241] Initialized hierarchical 
> allocator process
> I0806 20:00:24.142735 21366 hierarchical.cpp:280] Skipping recovery of 
> hierarchical allocator: nothing to recover
> W0806 20:00:24.147953 29945 process.cpp:2877] Attempted to spawn already 
> running process 

[jira] [Commented] (MESOS-9928) /OperationReconciliationTest.FrameworkReconciliationRaceWithUpdateSlaveMessage is severely flaky

2019-08-07 Thread Andrei Sekretenko (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902010#comment-16902010
 ] 

Andrei Sekretenko commented on MESOS-9928:
--

{code:java}
 W0806 20:00:24.354037 21365 scheduler.cpp:568] Dropping SUBSCRIBE: Scheduler 
is in state CONNECTING {code}
This flake looks simple: the test does not wait for the scheduler to reconnect 
before trying to resubscribe.

> /OperationReconciliationTest.FrameworkReconciliationRaceWithUpdateSlaveMessage
>  is severely flaky
> 
>
> Key: MESOS-9928
> URL: https://issues.apache.org/jira/browse/MESOS-9928
> Project: Mesos
>  Issue Type: Bug
>Reporter: Andrei Sekretenko
>Priority: Major
>  Labels: flaky, flaky-test, foundations
>
> Flakes are frequently observed in the internal CI.
> Example:
> {code}
> [ RUN  ] 
> ContentType/OperationReconciliationTest.FrameworkReconciliationRaceWithUpdateSlaveMessage/1
> I0806 20:00:24.128456 29945 cluster.cpp:177] Creating default 'local' 
> authorizer
> I0806 20:00:24.132164 21364 master.cpp:440] Master 
> 7bbcb55d-ce3b-40e6-a605-62ed7d843832 (ip-172-16-10-6.ec2.internal) started on 
> 172.16.10.6:36902
> I0806 20:00:24.132181 21364 master.cpp:443] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="hierarchical" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/MpmzC4/credentials" --filter_gpu_resources="true" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_operator_event_stream_subscribers="1000" 
> --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" 
> --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" 
> --publish_per_framework_metrics="true" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --require_agent_domain="false" --role_sorter="drf" --root_submissions="true" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/MpmzC4/master" --zk_session_timeout="10secs"
> I0806 20:00:24.132485 21364 master.cpp:492] Master only allowing 
> authenticated frameworks to register
> I0806 20:00:24.132494 21364 master.cpp:498] Master only allowing 
> authenticated agents to register
> I0806 20:00:24.132500 21364 master.cpp:504] Master only allowing 
> authenticated HTTP frameworks to register
> I0806 20:00:24.132506 21364 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/MpmzC4/credentials'
> I0806 20:00:24.132709 21364 master.cpp:548] Using default 'crammd5' 
> authenticator
> I0806 20:00:24.132845 21364 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0806 20:00:24.132975 21364 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0806 20:00:24.133085 21364 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0806 20:00:24.133188 21364 master.cpp:629] Authorization enabled
> I0806 20:00:24.135308 21363 whitelist_watcher.cpp:77] No whitelist given
> I0806 20:00:24.139948 21364 master.cpp:2168] Elected as the leading master!
> I0806 20:00:24.139968 21364 master.cpp:1664] Recovering from registrar
> I0806 20:00:24.140195 21364 registrar.cpp:339] Recovering registrar
> I0806 20:00:24.141042 21364 registrar.cpp:383] Successfully fetched the 
> registry (0B) in 0ns
> I0806 20:00:24.141141 21364 registrar.cpp:487] Applied 1 operations in 
> 25620ns; attempting to update the registry
> I0806 20:00:24.141793 21364 registrar.cpp:544] Successfully updated the 
> registry in 0ns
> I0806 20:00:24.141894 21364 registrar.cpp:416] Successfully recovered 
> registrar
> I0806 20:00:24.142277 21364 master.cpp:1817] Recovered 0 agents from the 
> registry (175B); allowing 10mins for agents to reregister
> I0806 20:00:24.142611 21366 hierarchical.cpp:241] Initialized hierarchical 
> allocator process
> I0806 20:00:24.142735 21366 hierarchical.cpp:280] Skipping recovery of